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HETEROGENEOUS CHANGE POINT INFERENCE 


FLORIAN PEIN\ HANNES SIELING^ AND AXEL MUNK\2 

Abstract. We propose H-SMUCE (heterogeneous simultaneous multiscale change- 
point estimator) for the detection of multiple change-points of the signal in a heteroge¬ 
neous gaussian regression model. A piecewise constant function is estimated by minimiz¬ 
ing the number of change-points over the acceptance region of a multiscale test which 
locally adapts to changes in the variance. The multiscale test is a combination of local 
likelihood ratio tests which are properly calibrated by scale dependent critical values in 
order to keep a global nominal level a, even for finite samples. 

We show that H-SMUCE controls the error of over- and underestimation of the number 
of change-points. To this end, new deviation bounds for F-type statistics are derived. 
Moreover, we obtain confidence sets for the whole signal. All results are non-asymptotic 
and uniform over a large class of heterogeneous change-point models. H-SMUCE is fast 
to compute, achieves the optimal detection rate and estimates the number of change- 
points at almost optimal accuracy for vanishing signals, while still being robust. 

We compare H-SMUCE with several state of the art methods in simulations and analyse 
current recordings of a transmembrane protein in the bacterial outer membrane with 
pronounced heterogeneity for its states. An R-package is available online. 


1. Introduction 

1.1. Change-point regression. Multiple change-point detection is a long standing task 
in statistical research and related areas. One of the most fundamental models in this 
context is (homogeneous) gaussian change-point regression 

(1.1) Yi = fx{i/n) + aei, i = I,... ,n. 

Here, Y = (U,..., Yk) denotes the observations, fi is an unknown piecewise constant mean 
function, a constant (homogeneous) variance and Ci,..., e„ are independent standard 
gaussian distributed errors. For simplicity, we restrict ourself in this paper to an equidis¬ 
tant sampling scheme Xi^n = i/n, but extensions to other designs are straightforward. 

Hnstitute for Mathematical Stochastics, Ceorg-August-University of Cottingen, Cold- 
SCHMIDTSTRASSE 7, 37077 COTTINGEN 

^Max Planck Institute for Biophysical Chemistry, Am Eassberg 11, 37077 Cottingen 
E-mail address: {fpein, hsieling, munk}@math.uni-goettingen.de. 

Date: February 8, 2016. 

2010 Mathematies Subject Classification. 62C08,62C15,90C39. 

Key words and phrases, change-point regression, deviation bounds, dynamic programming, heterogeneous 
noise, honest confidence sets, ion channel recordings, multiscale methods, robustness, scale dependent 
critical values. 


1 



2 


HETEROGENEOUS CHANGE POINT INFERENCE 


Methods for estimating the change-points in (1.1) and in related models are vast, see for 
instance (Yao, 1988 iDonoho and Johnstone 1994 Csorgo and Horvath, 1997 Bai and 


Perron , [1998 ; Braun et ah 2000 Birge and Massart , [2001 ; Kolaczyk and Nowak 2005 


Boysen et ah 2009; {Harchaoui and Levy-Leduc, |2010; Jeng et alj 2010 Killick et ah 


2012 Rigollet and Tsybakov, 2012| Zhang and Siegmund, 2012; Fryzlewicz, 2014) and the 


references in these papers. 

A crucial condition in most of the afore-mentioned papers is the assumption of homo¬ 
geneous noise, i.e. a constant variance in (1.1). In many applications, however, this 
assumption is violated and the variance varies over time, say. This problem 


arises for instance in the analysis of array CGH data, see (Muggeo and Adelho, 2011 


Arlot and Celisse, 2011). Further examples include economic applications, e.g. the real 


interest rate is modelled by Bai and Perron (2003) as piecewise linear regression with 
covariates and heterogeneous noise. In this paper we will discuss an example from mem¬ 
brane biophysics, the recordings of ion channels, see Section It is well known that the 


noise of the open state can be much larger than the background noise, see (Sakmann and 


Neher, 1995, Section 3.4.4) and the references therein, rendering the different states as a 


potential source for variance heterogeneity. 

To illustrate the effects of missing heterogeneity we show in Figure a reconstruction by 
SMUCF0 ( [Frick et al. , 2014), a method that has been designed for homogeneous noise. 
The constant variance assumption of SMUCE leads to an overestimation of the standard 
deviation (which is pre-estimated by a global IQR type estimator) in the hrst half and 
an underestimation in the second half. Therefore, in Figure SMUCE misses the hrst 
change-point and includes artihcial change-points in the second half to compensate for 
the too small variance it is forced to use, see also ( [Zhou , 2014). Note, that this haw is 
not a particular feature of SMUCE, it will occur for any sensible segmentation method 
which relies on a constant variance assumption. Hence, from Figure the fundamen¬ 
tal difficulty of the heterogeneous (multiscale) change-point regression problem becomes 
apparent; How to decide whether a change of huctuations of the data result from high 
frequent changes in the mean /i or merely from an increase of the noise level? Apparently, 
if changes can occur on any scale (i.e. the length of an interval of neighbouring observa¬ 
tions) this is a notoriously difficult issue and proper separation of signal and noise cannot 
be performed without extra information. 

Indeed, the basis of the presented theory is that often a reasonable assumption is to ex¬ 
clude changes of the variance in constant segments of fi (see Section 1.2). Under this 
relatively weak assumption, we show in this paper that estimation of /i for heterogeneous 
data in a multiscale fashion becomes indeed feasible. In addition, we also aim for a 


ffittp;//crcLn. r-project. org/web/packages/stepR, v. 1.0-3, 2015-06-18 
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(b) Simulated observations (black dots) together with the true signal (black line), the confidence 
band (grey), the confidence intervals for the change-point locations (brackets and thick lines), 
estimated change-points locations (red dashes) as well as the estimates by H-SMUCE (red dotted 
line) and by SMUCE (blue dashed line), both with a = 0.1. 

Eigure 1. Illustration of missing heterogeneity. 

method which is robust when changes in the variance occur at locations where the sig¬ 
nal is constant, as we believe that this cannot be excluded in many practical cases. To 
this end, we introduce a new estimator H-SMUCE (heterogeneous simultaneous multi¬ 
scale change-point estimator) which recovers the signal under heterogeneous noise over a 
broad range of scales, controls the family-wise error rate to overestimate the number of 
change-points, allows for conhdence statements for all unknown quantities, obeys certain 
statistical optimality properties, and can be efficiently computed. At the same hand it 
is robust against heterogeneous noise on constant signal segments, which as a byproduct 
reveals it also as robust against more heavily tailed errors. 

1.2. Heterogeneous change-point model. To be more specihc, from now on we con¬ 
sider the heterogeneous gaussian change-point model 

(1.2) Yi = fi{i/n) + a{i/n)ei, i = l,...,n, 

where now the variance cx^ is also given by an unknown piecewise constant function. For 
the following theoretical results we assume that it only can have possible change-points at 
the same locations as the mean function /i. In other words, (/i, cx^) is a pair of unknown 


0.0 0.2 0.4 0.6 0.8 1.0 

(a) True (black) and estimated (blue) standard deviation. 



1l 

• ll 

* •. . ".*• 

- - - - , • . [I. 


1 • 2 * • . .1 

• •.* • ••• • 
^ V A J 

• •* •• 

_!__ 1 _]_l_l 

• 

• 

• 

• 1 
• 

1 

• 

• *1 

. 1 

• 1 
• 1 

• 

• 

• 

• 

• 

• 


0.0 0.2 0.4 0.6 0.8 1.0 












4 


HETEROGENEOUS CHANGE POINT INFERENCE 


piecewise constant fnnctions in 


K 


(1.3) 5 := < (/i,cr2) : [0,1] ^ /i = ^ 


k=0 


K 

fc =0 


^ — y '®fc^['rfe,Tfc+i)5 G iKl 


with nnknown change-point locations Tq = 0 < Ti < • • • < < 1 = Tk+i for some 

nnknown nnmber of change-points ii" G N and also nnknown fnnction values G R 
and s\ G R+ of /4 and By technical reasons, we dehne /i(l) and cr^(l) by continuous 
extension of /i and respectively. For identihability of {i we assume 7 ^ rrik+i 'i k = 
0,..., iF and exclude isolated changes in the signal by assuming that /r : [0,1] —)■ R is a 


right continuous function. It is important to stress that in (1.3) we allow the variance to 


potentially have changes at the locations of the changes of the signal, but the variance 
need not necessarily change when /i changes, as we do not assume s\ 7 ^ In 

particular, homogeneous observations are still part of the model. The other way around, 
we assume that within a constant segment of /i it may not happen that the variance 
changes, i.e. the local signal to noise ratio is assumed to be constant on [Tk,Tk+i) for 
all A; = 0,..., iF. We argue that this is a reasonable assumption in many applications 
(recall the examples given above and see our data example in Section]^, since a change- 
point represents typically a change of the condition of the underlying state. Moreover, 
for example, in many engineering applications locally a constant signal to noise ratio is 


assumed (Guillaume et ah, 1990), which motivates our modelling as well. However, we 


stress that the restriction to model (1.3) is only required for our theory. For the practical 


application we will show in simulations in Section 4.2 that H-SMUCE is in addition robust 
against a violation of this assumption (i.e. when a variance change may occur without a 
signal change) and hence works still well in the general heterogeneous change-point model 


( 1 . 2 ) with arbitrary variance changes. 


1.3. Heterogeneous change-point regression. Up to our best knowledge there are 
only few methods which explicitly take into account the heterogeneity of the noise in 
change-point regression, either in the model considered here or in related models. Thereby, 
we have to distinguish two settings. 

First, that also changes in the variance are considered as relevant structural changes of 
the underlying data (even when the mean does not change) and to seek for changes in the 
mean and in the variance, respectively. In this spirit are local search methods, such as 


binary segmentation (BS) (Scott and Knott, 1974 Vostrikova, 1981) (if the correspond¬ 


ing single change-point detection method takes the heterogeneous variance into account), 
but also global methods can achieve this goal, e.g. PELT (Killick et ah, 2012). For a 


Bayesian approach in this context see (Du et ah, 2015) and the references therein. In 


addition, methods which search for more general structural changes in the distribution 
potentially apply to this setup as well, see e.g. (Csorgo and Horvath, 1997] Arlot et al 
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2012 Matteson and James, 2014). 


This is in contrast to the setting we address in this paper: The variance is considered as 
a nuisance parameter and we primarily seek for changes in the signal /i. Hence, we aim 
for statistically efficient estimation of the mean function, but still being robust against 
heterogeneous noise. Obviously, this cannot be achieved by methods addressing the first 
setting. Although of great practical relevance, this situation has only rarely been consid¬ 
ered and in particular no theory exists, to our knowledge. The cross-validation method 
LOOVF (Arlot and Celisse, 2011) and cumSeg (Muggeo and Adelfio, 2011) have been 
designed specifically to be robust against heterogeneous noise. Moreover, also circular 
binary segmentation (CBS), see (Venkatraman and Olshen, 2007), applies to this. 

For a better understanding of the problem considered here it is illustrative to distinguish 
our setting further, namely from the case when it is known before hand that changes in the 
variance will necessarily occur with changes in the signal. This will potentially increase 
the detection power as under this assumption variance changes can be used for finding 
signal changes, as well. The information gain due to the variance changes for this case 
has been recently quantified by Enikeeva et ah (2015) in terms of the minimax detection 
boundary for single vanishing signal bumps of size \ 0. More precisely, if the base line 
variance is cXq and the variance at the bump is Uq -|- ct^ then the constant in the minimax 


detection boundary is 6 = ■\/2 (To^2/(2 -|- c^) for c = 




CTn 


lim,, 


<Tn/Jn, see (Enikeeva 


et ah, 2015, Theorems 3.1-3.3). For the particular case of homogeneous variance, i.e. 


= 0, we obtain b = a/2cto and the factor = 1 becomes maximal, see also 

(Diimbgen and Walther, [2008 ; Frick et ah, 2014). This reflects that no additional infor¬ 
mation on the location of a change can be gained from the variance in the homogeneous 
case. Comparing this to the inhomogeneous case we see that when the variance change 
is known to be large enough, i.e. cTq ^ lim^^go > 0, additional information for the 

signal change can be gained from the variance change, as then b < \/2aQ, provided it is 
known that signal and variance change simultaneously. 

In contrast, in the present setting the variance need not necessarily change when the sig¬ 
nal changes, hence the ’’worst case” of no variance change from above is contained in our 
model, which lower bounds the detection boundary. The situation is further complicated 
due to the fact that missing knowledge of a variance change can potentially even have 
an adverse effect because in model ( |1.3 ) detection power will be potentially decreased 
further as the nuisance parameter cr^(-) hinders estimation of change-points of /x. For this 
situation the optimal minimax constants are unknown to us, but from the fact that the 
model with a constant variance is a submodel of our model (1.3) it immediately follows 
that the minimax constant for a single bump has to be at least a/2(To. This will allow 
us to show that H-SMUCE attains the same optimal minimax detection rate as for the 
homogeneous case and 4 (To instead of a/2cto as the constant appearing in the minimax 
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detection boundary. Remarkably, the only extra assumption we have to suppose is that 
signal and variance have to be constant on segments at least of order log(n)/n, see The¬ 
orem 3.10 This reflects the additional difficulty to separate ’’locally” signal and noise 


levels in a multiscale fashion. In other words, when we assume that the number of i.i.d. 
neighbouring observations (no change in signal and variance) in each segment is at least 
of order log(n), separation of signal and noise will be done by H-SMUCE in an optimal 
way (possibly up to a constant). 

1.4. Heterogeneous change-point inference. We define H-SMUCE as the multiscale 
constrained maximum likelihood estimator restricted to all solutions of the following op¬ 
timisation problem 


(1.4) 


argmin |X(/i)| s.t. max lT/(U,/i([i/?7,, j/n])) - < 0, 


see also (Boysen et ah, 2009 Davies et ah, 2012 Frick et ah, 2014) for related approaches. 


Here, M (as a subset of S) is the set of all piecewise constant mean functions, |X(/i)| the 


cardinality of the set of change-points of /r and the right hand side of (1.4) a multiscale 


constraint to be explained now. Given a candidate function /i this tests simultaneously 
over the system of all intervals on which fi is constant, whether its function value 

fi{[i/n, j/n]) is the mean value of the observations on the respective interval [i/n,j/n]. 
In order to perform each test, i.e. to decide whether the observations Yi,... ,Yj have 
constant mean fi{[i/n,j/n]), the local log-likelihood-ratio statistic 


(1.5) 


Ti{Y,n{[i/nJ/n])) := (j-i + l) 


{Yij - n{[i/n,j/n])y 




with Yij := (j—z-(-l)“^ Yi and local variance estimate := (j—i)~^ Yli\=i O^i — Yij] 

is compared with a local threshold qij in a multiscale fashion, to be discussed now. 

In what follows, we restrict the multiscale test to intervals in the dyadic partition 

dfi 

(1.6) X;=lJXfc, 

k=l 

where dn '■= Llog 2 (^)J is the number of different scales and 

[pj r 

(1.7) Vt := [J 


1 = 1 


T + (/ - 1)2^ /2^ 


n 


n 


the set of intervals from the dyadic partition with length n ^2^. This allows fast com¬ 
putation and simplifies the asymptotic analysis. Nevertheless, our methodology can be 


adapted to other intervals systems, see Remark |2.2 


It remains to determine thresholds q^j for T) in (1.6) that combine the local tests ap¬ 
propriately. To this end, note that logarithmic (or related) scale penalisation as in the 
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homogeneous case (Diimbgen and Spokoiny, 2001 Diimbgen and Walther, 2008 Frick 


et ah, 2014) does not balance scales anymore appropriately in the heterogeneous case. In 


particular, this will give a multiscale statistic which diverges, since due to the local vari¬ 
ance estimation the test statistic fails to have subgaussian (but still has subexponential) 
tails. To overcome this burden we introduce in Section scale dependent critical values 
such that the multiscale test has global significance level a, see (2.3). To this end, the dif¬ 
ferent scales are balanced appropriately by weights (3i,..., with dn '■= Llog 2 (’^)J) 
(2.4) and (2.5). More precisely, these weights determine the ratios between the rejection 
probabilities of the multiscale test on a corresponding scale. Existence and uniqueness of 


the so defined scale dependent critical values is shown in Lemma 2.1 and explicit bounds 


are given in Lemma |3.1[ The weights also allow to incorporate prior scale information, 
see Section EM 

Using the so obtained thresholds allows to obtain several conhdence statements which 
are a main feature of H-SMUCE. First of all, we show in Section that the probability 
to overestimate the number of change-points is bounded by the signihcance level a uni¬ 
formly over S in ( 1.3| ), P{K > K) < a, see Theorem 
overestimation bound 


3.3 


More specihcally, we show the 


( 1 , 8 ) 


sup (K > K + 2k] < V 4 e No, 

(p,a-'^)GS 


see Theorem |3.4[ In Theorem |3.5| we provide an exponential bound for the underestimation 
of the number of change-points by H-SMUCE, P{K < K). To this end, we show new 
exponential deviation bounds for F-statistics (Section C.3), which might be of interest 
by its own. Combining the over- and the underestimation bound provides upper bounds 
for the errors P{K ^ K) and P[\K — K\]. For a fixed signal both bounds vanish super 
polynomially in n if a = \ 0 when the weights are chosen appropriately, see Remark 


3.6 Consequently, the estimated number of change-points converges almost surely to the 


true number, see Theorem |3.7 Further, these exponential bounds enable us to obtain a 
conhdence band for the signal p as well as conhdence intervals for the locations of the 
change-points, for an illustration see Figures [T] and We show that the diameters of 
these conhdence intervals decrease asymptotically as fast as the (optimal) sampling rate 
up to a log factor. All conhdence statements hold uniformly over Sa,\ C S, all functions 
with minimal signal to noise ratio > A and minimal scale > A := minfc=o,...,A (a+i — t^), 
with A and A arbitrarily, but hxed, see Theorems |3.8| and m 


1.5. H-SMUCE in action. Figure [^illustrates the performance of H-SMUCE in an ex¬ 
ample with n = 1 000 observations and iF = 10 change-points. We found that H-SMUCE 
misses for a = 0.1 one change-point (as the choice a = 0.1 tunes H-SMUCE to provide 
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(a) True standard deviation (black) and estimates resulting from H-SMUCE at a = 0.1 (red 
line) and a = 0.3, 0.5 (blue line). 




Figure 2. |bj|^ Observations (black dots), true signal (black line), confidence band (grey), 
confidence intervals for the change-point locations (brackets and thick lines), estimated 
change-points locations (red dashes) and estimate (red line) H-SMUCE at given a and with 
equal weights fi\ = ■ ■ ■ = = l/dn, see (2.4) and 
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the strong guarantee not to overestimate the number of change-points K with probabil¬ 


ity 0.9, see (1.8)), whereas for a between 0.15 and 0.99 (only displayed for a = 0.3 and 
a = 0.5) the correct number of change-points is detected always (while providing a weaker 
guarantee for not overestimating K). In addition, for a between 0.15 and 0.99 each true 
change-point is covered by the associated confidence interval at level 1 — a. This illus¬ 
trates the influence of the significance level a. Notably, we find that the reconstructions 


are remarkably stable in a. In fact, combining Lemma 3.1 and (A.l) shows that the width 


of the confidence band is proportional to ^log(l/a) which decreases only logarithmically 
for increasing a. 


We compare the performance of H-SMUCE with CBS (Venkatraman and Olshen, 2007), 


cumSeg ( Muggeo and Adelhof [2011 ) and LOOVF (Arlot and Celisse, 2011) in several 
simulation studies in Section(see also Figure]^ in Supplement [B| for their performance 
on the data in Figure]^, where we also examine robustness issues, see Section 4.2 In all 
of these simulations and in the subsequent application H-SMUCE performs very robust 


and includes too many change-points only rarely in accordance with (1.8) 


In Section]^ we apply H-SMUCE to current recordings of a transmembrane protein with 
pronounced heterogeneity for its states. In contrast to segmentation methods which rely 
on homogeneous noise, we found that H-SMUCE provides a reasonable reconstruction, 
where all visible gating events are detected. 

Finally, we stress that the confidence band and confidence intervals for the change-point 
locations provided by H-SMUCE can be used to accompany any segmentation method to 
assess significance of its estimated change-points. This is illustrated in Section as well. 
Computation of the estimator by a pruned dynamic program and of the critical values 
based on Monte-Carlo simulation is explained carefully in Supplement There we also 
study the theoretical and empirical computation time of H-SMUCE. Due to the under¬ 
lying dyadic partition the computation of H-SMUCE is very fast, in some scenarios even 
linear in the number of observations. Additional simulations results are collected in Sup¬ 
plement!^ and all proofs are given together with some auxiliary statements in Supplement 
An R-package is available onlin^ 

2. Scale dependent critical values 


For the dehnition of H-SMUCE it remains to determine the local thresholds qij in (1.4) 


First of all, the multiscale test on the r.h.s. in (1.4) should be a level a test, i.e 


( 2 . 1 ) 


sup I max [T/(U,p([i/n, j/n])) - > 0 < a. 


^http;//www.stochastik.math.uni-goettingen.de/hsmuce 
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Here, we use the same threshold for all intervals of the same length as no a-priori informa¬ 
tion on the change-point locations is assumed. More precisely, as we have restricted the 


multiscale test to the dyadic partition in (1.6) we aim to find a vector of critical values 
q := (gi,..., qd„), where now qij := qk if and only if j — i -f 1 = 2^. To this end, w.l.o.g., 
we may consider standard gaussian observations Zi,... ,Zn instead of Yi,...,since 
the supremum in (2.1) is attained at /i = 0 and cr^ = 1, see the proof of Theorem 3.3 We 


then define the statistics Ti,... ,Td„ with in (|1.7|) as 

( 2 . 2 ) 

Then, the critical values qi,... ,qd„ fulhl (|2.1l) if 


Tfc := max T^{Z, 0) for k = 1,..., dn- 

[i/n,j/n\eVk 


(2.3) 


P 


( , [Tfc - gfc] > 0 ) =1- F (gi,..., qdj = a, 

yk=l,...,dn J 


with F the cumulative distribution function of (Ti,... ,Td„). 

As the critical values gi,..., qd„ are not uniquely determined by (2.3) they can be chosen 


to render the multiscale test particularly powerful for certain scales. To this end, we 
introduce weights 

dn 

(2.4) /?!,...,> 0, with = 1, 

k=l 

where = 0 means to omit the fc-th scale, i.e. g^ = oo. Finally, we define gi,... ,gd„ 
implicitly through 


(2.5) 


{qi 


1 - Fdr, {qdj 


Pi Pd„ 

with Ffc the cumulative distribution function of T^. If = 0 this will not enter the systems 


of equations in (2.5). The weights determine the fractions between the probabilities that 


a test on a certain scale rejects, and hence regulate the allocation of the level a among 
the single scales. In summary, the choice of the local thresholds qij boils down to choosing 
the significance level a and the weights (3i,..., fdd^^ we discuss these choices in Section !^ 
more carefully. If no prior information on scales is available a default option is always to 
set all weights equal, i.e. (5i = ... = I3d„ = 1/dn- 


The next result shows that the vector of critical values satisfying (2.3)-(2.5) is always 
well-defined. 

Lemma 2.1 (Existence and uniqueness). For any a G (0,1) and for any weights /3i,..., [3d„, 
s.t. (2.4) holds, there exits a unique vector of critical values q= {qi, ■ ■ ■, qdP) ^ ^+' which 


fulfils the equations (2.3) and (2.5). 
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An explicit computation of the vector q (or F) appears to be very hard, since the statistics 
Ti,... ,Td„ are dependent, although the dependence structure is explicitly known. Alter¬ 
natively, it would be helpful to have an approximation for the distribution (and hence 


its quantiles) of the maximum in (2.1), which, however, appears to be rather difficult, as 
well. For the case when qij = q (which does not apply to H-SMUCE), see Davies (1987 


2002). Therefore, we determine in Section A.2 the vector q by Monte-Carlo simulations. 
Note that the distribution does not depend on the specihc element (/i, cr^) G S and hence 
the critical values can be computed in a universal manner. We stress that the determi¬ 
nation of the scale dependent critical values is not restricted to our setting and can also 
be applied to multiscale testing in other contexts. Different to scale penalisation and like 


the block criterion in (Ruhbach and Walther, 2010) no model dependent derivations are 


required and the critical values are adapted to the exact hnite sample distribution of the 
local test statistics. However, our approach allows additionally a flexible scale calibration 


by the choice of the weights (see Section 3.4) and arbitrary interval sets can be used as 
the following remark points out. 

Remark 2.2 (Other interval sets). H-SMUCE can be easily adjusted to other interval sets 
as follows. Let X be an arbitrary set of intervals. Then, we replace in the dehnition of 


H-SMUCE in (3.3) and (3.7) the set V by the set X and the vector (Ti,... ,Trf^) by the 
vector (T 2 ,..., Tn) (empty scales should be omitted) in Section]^ with 

( 2 . 6 ) n := 


max T-(Z,0). 

[i/nj/njel, 


Again it remains to choose the signihcance level a G (0,1) and the weights (32, ■ ■ ■, (3n 
to determine the critical values required for H-SMUCE. Note, however, that the critical 
values and its bounds in Lemma 3.1 and therefore the results in Section]^ (besides of 


Theorems 3.3 and 3.4) will depend on the specihc system X and have to be computed for 


each X separately. 

Employing a larger interval set than T) may lead to a better detection power, but at the 
price of a larger computation time. Hence, in practice, a trade-off between computational 
and statistical efficiency may guide this choice as well. Our R-package includes beside 
of the dyadic partition also the system of all intervals (of order 0(71^), statistically most 
efficient, but computationally expensive) and the system of all intervals of dyadic length 
(0(n\og(n), intermediate efficiency and computational time). Interesting choices might 
be also approximating sets like 77app introduced in (Walther| 2010 Rivera and Walther 


2013) which are larger than the dyadic partition, but achieve the minimax boundary in 


the context of density estimation. 
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3. Theory 


In this section we collect our theoretical results. We start with finite bounds for the critical 
values. These will allow to bound V{K ^ K). With these bounds we obtain confidence 
statements for the signal /i and its main characteristics. Finally, we investigate asymptotic 
detection rates of H-SMUCE for vanishing signals. 


3.1. Finite bounds for over- and underestimation. In the following we require upper 
bounds for the critical values, since the definition of the critical values by the equations 


(2.3)-(2.5) is implicit. 


n 


Lemma 3.1 (Bound on critical values). Let q = be the vector of critical 

values defined by (2.3)-(2.5), then for every k E {2,..., dn} such that 

(3.1) 2-'=log^ 

we have 

(3.2) gfc<81og 


2^afik 


1 

< - 
- 2 


n 


2'^a/3k 


Remark 3.2. The log term of the bound (3.2) can be split into a scale dependent penalty 
term log(?7,2“^) which is of the same order as the penalties in the homogeneous case in 
(Diimbgen and Spokoiny, 2001 Frick et ah, 2014), and into the term log((a/5fc)“^) which 


incorporates the significance level a and the weight fik- 


The following theorem shows that the significance level a controls the probability to 
overestimate the number of change-points. 


Theorem 3.3 (Overestimation control I). Assume the heterogeneous gaussian change- 
point model (1.2). Let K := |X(/i)| be the number of change-points of a signal p, E Ai . 
Let further K be the estimated number of change-points by H-SMUCE, i.e. 


(3.3) K := min < |X(/i)| : p E M. with max [X/YT, p{[i / n, j/n])) — qij] < 0 > . 

I J 

Then, for any vector of critical values q with significance level a E (0,1) and weights 
fii,..., fidn in (2.3)-(2.5), uniformly over S in (1.3) it holds 


sup P(^,a 2 ) [K > K) <a. 

(/l,(T2)g5 


The theorem gives us a direct interpretation of the parameter a as the probability to 
overestimate the number of change-points. This even holds locally, i.e. on every union of 
adjoining segments of the estimator H-SMUCE with probability 1 — a there are at least 
as many change-points as detected. Moreover, we strengthen the result by showing that 
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the probability to estimate additional changes decays exponentially fast and hence the 
expected overestimation is small. 


Theorem 3.4 (Overestimation control II). Under the assumptions of Theorem 3.3. we 
have 


snp (/s: > ir + V fc e No. 


Moreover, 


snp {k - K)+ 

(/i,a-2)g5 


< 


2q; 


a 


To control the probability P(iE ^ K) we need additionally an npper bound for the 
probability to underestimate K. Unlike to the overestimation bounds in the Theorems 
3.3 and 3.4|the probability to underestimate cannot be bounded uniformly over S, since 


size and scale of changes could be arbitrarily small. This is made more precise in Theorem 


3.10 which gives the detection boundary in terms of the smallest (standardized) jump size 


A and the smallest scale A. The next theorem provides an exponential bound uniformly 
over the subset 

(3.4) 5a,A := I (/i, a^) e 5 : A < inf Uk-i\ ^ ^ ^ 

( i<fc<A max (sfc-i, Sfc) o<k<K^ 

with A, A > 0 arbitrary, but fixed. 


Theorem 3.5 (Underestimation control). Let 5a,a be as in (3.4) with A, A > 0 arbitrary, 
but fixed, and kn ■= [log2(nA/4)J. ITe define 

5 \ -I 2 


T] : = 


l-3exp|-- 


nAA^ 

32 


- A/16log 



Under the assumptions of Theorem 3.3 and if n\ > 32 and 


(nA) ^ log 
are satisfied, then uniformly in 5a,a 
( 3.5) P(M,a 2 ) (k<K^<l-T]^ and 


Aa/3fc„ 


< 


512 


K-K 


<K{l-ri). 


Roughly speaking, H-SMUCE detects any change-point of the signal /x under assumptions 


of Theorem 3.5 at least with probability p. A sharper version with different probabilities 
Pi,... is given in Theorem |C.5 in the supplement. Such a result clarifies the depen¬ 
dence on the different weights, but is technically way more difficult. Combining Theorems 
gives upper bounds for the probability P(iC 7 ^ K) and the expectation 


3.3 


3.4 


and 


3.5 


E[\k — K\] that H-SMUCE missspecihes the number of change-points. 
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Remark 3.6 (Vanishing errors). For a fixed signal (fixed A and A are sufficient) both 
errors vanish asymptotically if a = «„—?• 0 is chosen such that \og{anlikn,n)/n —)■ 0, with 


triangular scheme ..., I3d„,n for the weights in (2.5). We can achieve a rate arbitrary 
close to the exponential rate by the choice an = exp(—n/r„), with —>■ oo arbitrarily 
slow. The condition on the sequence (3k„,n allows a variety of possible choices of the 
weights, too. For instance, the choice = ■ ■ ■ = [3d„,n = 1/dn, which weights all scales 
equally, fulfils this condition. 

A direct consequence is the strong model consistency of H-SMUCE. 


Theorem 3.7 (Strong model consistency). Assume the setting of Theorem 3.3 and let 
{Kn)n be the seguence of estimated numbers of change-points by H-SMUCE, where Kn is 
as K with significance level an and corresponding weights /9i,n, • • •,/dd„,n- Moreover, let 


<Sa,\ be as in (3.4) with A, A > 0 arbitrary, but fixed, and kn '■= [log2(nA/4)J. Let p > 0 
be arbitrary, but fixed. If 


(3,6) 


lim 


n 


i+p 


= 0 and lim 

n^oo 


log {anf3k„,n) 


= 0 


n—^oo n^oo n 

holds, then Kn —>■ K, almost surely and uniformly in 5a,a- 


Again, there is a wide range of sequences and I3k„,n to satisfy (3.6). Moreover, we still 


have (weak) model consistency, if ^ 0 and the second condition of (3.6) holds. 

3.2. Confidence sets. In this section we obtain confidence sets for the signal p and for 


the locations of the change-points. First, we show that the set of all solutions of (1.4) 


max [T/ (V,p([i/n, j/n])) - q^j] < 0 


(3.7) C'(q) := G Ai : |X(/i)| = K and 
is a confidence set for the unknown signal p. 

Theorem 3.8 (Confidence set). Assume the setting of Theorem \3.^ and let 5a,a be as 


in (3.4) with A, A > 0 arbitrary, but fixed, and kn ■= [log2(nA/4)J. Let C{-) be as in 
; a vector 
with lim,^_,.oo 


(3.7) and be a vector of critical values determined by significance level a and weights 

n-^\og{/3k„,n) = 0. Then, 


/3l,n) ■ ■ ■ ) fdd, 
(3.8) 


lim inf (p G C (q„)) > 1 - a. 

77—>■00 (/7,Cr2)EtSA,A 


This shows that the asymptotic coverage of C (q„) is at least 1 — a. Lemma C.6 gives an 


exponential inequality similar to (3.5) which shows that C (q„) is also a non-asymptotic 
confidence set. We further derive from this set confidence intervals for the change-point 
locations. 
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Theorem 3.9 (Change-point locations). Assume the setting of Theorem 3.8, where a is 


replaced by a sequence an —t 0. Let Cn ■= rn/n < A/2 and kn ■= [log2(nCn/2)J s.t. 
(3.9) liminf ;—^ , .A ^, and lim _ g 

Then, 


lim inf -— f— 
n-s-oo log(n) 


216 ^ log (a„/3fc„,r 

— . , ■ „ ,, and lim - 

mm(A^, 1) n^oo Tn 


(3.10) 


lim snp P(^,a2) 
(M,<x2)e5A,A 


snp 


max c, 


n 


Ffc - 41 > 1 =0. 


Here, the rate c„ is eqnal to the sampling rate 1/n up to the (logarithmic) rate 
depending on the tuning parameters Q!n4„,n- For example, if > 0, 


r„/log(n) —>■ cx) is sufficient to satisfy (3.9). A non-asymptotic statement is given in 
Lemma [C.7 in the supplement. For visualization of the conhdence statements it is use¬ 


ful to further derive a conhdence band i?(q„) for the signal as in (Frick et ah, 2014 


Corollary 3 and the explanation around). It can be shown that also the collection 
/(q„) = {Kn, B{q^^),[Lk, Rk]k=i with [Lk,Rk] conhdence intervals for the change- 

point locations according to Theorem 3.9, satishes (3.8). Recall Figures and for an 
illustration. It is also possible to strengthen the statements of this section to sequences 
of vanishing signals with —)■ 0 and A„ —)■ 0 slow enough, but we omit such results. 

3.3. Asymptotic detection rates for vanishing signals. For the detection of a single 
vanishing bump against a noisy background see Theorem |C.8 in the supplement. The 
following theorem deals with the detection of a signal with several vanishing change- 
points. 

Theorem 3.10 (Multiple vanishing change-points). Assume the heterogeneous gaussian 


change-point model (1.2). Let {Kn)n ■= {\^{Tn)\)n be the sequence of true number of 
change-points. Let further {Kn)n be the sequence of the estimated numbers of change- 
points by H-SMUCE (3.3), with significance levels an and weights (3i^n, ■ ■ ■, (ddn,r. 


Let 


RA„,An C S be a sequence of submodels as in (3.4) and kn ■= [log2(nA„/4)J. IFe further 
assume 


(3,11) 


as well as 


lim ml -—^ 
n-s-oo log(n) 


> 512 and lim 


log {anl3k„,r, 
nXn 


= 0 


(1) for large scales, i.e. liminf„>o-^n > 0, the limit nXnA'^ \og{l/{an(3k„,n)) 

(2) for small scales, i.e. A„ —?• 0, the inequality 

(3.12) ^/^n^n>[VWi2 + C+ en) 


-1 


—>■ oo, 


























16 


HETEROGENEOUS CHANGE POINT INFERENCE 


with possibly e„ —)• 0, but such that e„-^— log(A„) —>■ cx) and 

y V^og{8/{an/3k„,n)) ^ 1 

limsup - , =— < , , 

n^oo 6„7-log(A0) \/^ 

with (7 = 0 for Kn bounded and C = 16a/6 for unbounded. 


Then, 


lim sup 


P 


{Un,crl) \^n 


K„ < Kr, ] = 0. 


Theorems C.8 and 3.10 state conditions on the tuning parameters and [dk„,n as well 
as on the length of the minimal scale \In\ ='■ (to simplify notations we only write 
in the following) and the standardized jump size A„ to detect the vanishing signals 
uniformly over 5A„,An- addition, lim„^oo ctn = 0 holds, then we control also the 

probability to overestimate the number of change-points and therefore the estimation of 
the number of change-points is still consistent in the case of a vanishing signal. The 
main condition in both theorems is that y/nXfAn has to be at least of order a/— log(A„), 


see (C.9) and (3.12). This is optimal in the sense that no signal with a smaller rate 


can be detected asymptotically with probability one, see (Diimbgen and Spokoiny, 2001 


Chan and Walther, 2013 Frick et ah, 2014) for the case of homogeneous observations. 


and note that this is a sub-model of our model. But different to the homogeneous case 


we need, in addition, that An is at least of order log(?7,)/?7,, see (C.IO) and (3.11). Such 


a restriction appears reasonable, since for the additional variance estimation only the 
number of observation on the segment is relevant and not the size of the change. Finally, 
we observe that the constants encountered in the lower detection bound for H-SMUCE 


in (C.9) and (3.12) increase with the difficulty of the estimation problem, where the 


difficulty is represented by the number of vanishing segments. All of these constants are 


a little bit larger as the analogue constants for SMUCE in (Frick et ah, 2014, Theorem 5 


and 6) reflecting the additional difficulty encountered by the heterogeneous noise. More 
precisely, we have 4 instead of the optimal a/2 for one vanishing segment, a/512 instead 
of 4 for a bounded number of vanishing segments and a/512 -|- 16a/6 instead of 12 for an 
unbounded number of vanishing segments. Note again, that the optimal constants for the 
heterogeneous case are unknown to us. 

3.4. Choice of the tuning parameters. In this section we discuss the choice of the 
tuning parameters a and fti,... ,/5d„- 

Choice of a. As illustrated in Figure [^the choice depends on the application. If a strict 
overestimation control of the number of change-points K is desirable a should be chosen 


small, e.g. 0.05 or 0.1, recall Theorems 3A and 3A This might come at the expense of 
missing change-points but with large probability not detecting too many (recall Figure 
and see also the simulations in Section]^. If change-point screening is the primarily 
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goal, i.e. we aim to avoid missing of change-points, a should be increased, e.g. a = 0.5 


or even higher, since Theorem 3^ shows that the error probability to underestimate the 
number of change-points decreases with increasing a. If model selection, i.e. K = K, is 
the major aim, an intermediate level that balances the over- and underestimation error 
should be chosen, e.g. a between 0.1 and 0.5. Both errors vanish super polynomially for 
the asymptotic choice a = G exp(—o(n)), see Remark |3.6 A hnite sample approach is 
to weight these error probabilities '-yV{K > K) -|- (1 — ^)V{K < K), with 7 G (0,1), and 
to choose a such that its upper bound 

2K\ 


7 a -h (1 - 7 ) 1 - 


l-3exp|-- 


/nAA^ 



- 1/16 log 


is minimized. This also allows to incorporate prior information on (A, A). Alternatively, 
the bound on the expectation E[|A' — A'l] by combining Theorems 


3.4 


and 


3.5 


can be 


minimized to take the size of the missestimation into account. Despite of all possibilities 
to choose the ’best’ a for a given application, comparing estimates at different a can 
be helpful to trace the ’’stability of evidence” of the estimated change-points at different 
signihcance levels. Of course, the interpretation of such a ’’signihcance screening” does 
not allow for a frequentist interpretation of a signihcance level anymore as a has to 
be hxed in advance, see e.g. (Schervish, 1996). Nevertheless, it might give for instance 
some indication whether to perform further experiments. Despite of this, for a hxed a 
the conhdence statements of H-SMUCE can also be used to support hndings by other 
estimators. This is illustrated in Section for the ion channel application. 

Choice of /3i,...,As a default choice we recommend equal weights /?i = = 

/3d„ = l/dn- This choice fulhls (together with many other choices) the conditions of 
the Theorems 3.7 and 3.8 Unlike as for the signihcance level a only the bound for the 
underestimation of the number of change-points depends on these weights. Note, that 
this gives the user the possibility to incorporate prior information on the scales without 


violating the overestimation control in Theorems |3.3| and |3.4[ If for instance changes are 
expected to occur only on small segments then the detection power on these scales can be 
increased if the hrst weights are chosen large and the other ones small (or even zero). In 
contrast, if the general signal to noise ratio is expected to be very small then it is nearly 
impossible to detect changes on small scales and larger scales should be weighted more to 
detect at least the changes on these scales. A quantitative influence of the weights on the 


detection power can be seen in the underestimation bound in Theorem C.5 in Supplement 
which is a rehnement of Theorem 3.5 We also investigate such choices quantitatively 
in simulations in Section IH 
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4. Simulations 


In this section we compare H-SMUChj^in simulations with CBS (Venkatraman and 01- 
shen, 2007), cumSeg ( Muggeo and Adelfi^ 2011) and LOOVF (Arlot and Celisse, 2011) 
as they are also designed to be robust against heterogeneous noise. Moreover, we in¬ 


clude SMUCE (Frick et ah, 2014) in simulations with a constant variance as a benchmark 


to examine how much the detection power of H-SMUCE decreases in this case, which 
may be regarded as the price for adaptation to heterogeneous noise. We £x the weights 
/3i,..., = 1/dn and vary the signihcance level a. A simulation with tuned weights can 

be found in Section [B.2| in the Supplement. For circular binary segmentation (CBS) we 
call the function segmentByCB^wiih. the standard parameters. For the cross-validation 
method LOOVF we use the Matlab function proc-LOOVB^ with, the parameter choice 
of the demo file. For cumSeg we call the method jumpoint^with the parameter k large 
enough such that the estimation is not influenced by this choice. For SMUCE we call 
the function smiicei^ with the standard parameters, in particular the interval set of all 
intervals is used if n < 1 000. 

To avoid specific interactions between the signal and the dyadic partition we generate in 
each repetition a random pair ct|.) G S (all random variables are independent from 
each other). 

(a) We £x the number of observations n, the number of change-points A", a constant 
C and a minimum value for the smallest scale Amin- 

(b) We draw the locations of the change-points Tq := 0 < Ti < ■ ■ ■ < < 1 =: tk+i 

uniformly distributed with the restriction that A := minfc=o,...,A |ua:+i — Ufcl > A min . 

(c) We choose the function values Sq, ..., of the standard deviation function by 
Sk := 2*^'=, where Uq, ..., Uk are uniform distributed on [—2, 2]. 

(d) We determine the function values mo,..., mx of the signal /ir such that 


(4.1) 


\mk -mfc_i| = W —min 
n 


Tfc+l Tfc T~k Tfc—1 


’k-1 


-1 


W k = 1, 


iF. 


Thereby, we start with mo = 0 and choose randomly with probability 1/2 whether 
the expectation increases or decreases. 


By (4.1) we provide a situation where all change-points are similarly hard to find, recall 


the minimax detection boundary from Section 3.3 An example has been displayed in 
Figure in the introduction, where H-SMUCE misses at a = 0.1 one change-point and 

"http://www.stochastik.math.uni-goettingen.de/hsmuce, v. 0.0.0.9000, 2015-04-15 
^http://craui.r-project.org/web/packages/PSCBS/, v. 0.40.4, 2014-02-04 
'■http://www.di.ens.fr/~arlot/code/CHPTCV.htm, v. 1.0, 2010-10-27 
^http://craui.r-project.org/web/packages/cumSeg/, v. 1.1, 2011-10-14 
'http://crcLn.r-project.org/web/packages/stepR, v. 1.0-3, 2015-06-18 
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detects for a between 0.15 and 0.99 (only displayed for a = 0.3 and a = 0.5) the correct 
nnmber of change-points. In Fignre (Snpplement we see that CBS (Venkatraman 


and Olshen, 2007) finds also all change-points, bnt detects further changes. Less good is 


the performance of cumSeg (Muggeo and Adelfio, 2011) and LOOVF (Arlot and Celisse 


2011) which both miss several changes and LOOVF adds also a false positive. We examine 


these methods now more extensively. All simulations are repeated 10 000 times. 

In the following we report the difference between the estimated K and the true number K 
of change-points as well as the mean of the absolute value of this difference. Additionally, 
we use the false positive sensitive location error 

k+i 

lOj.-! — Ufc-l| + lOfc — 


FPSLE = ^ V 


with Ik E {I,..., K + 1} such that {fk-i + fk)/2 G P*.], i.e. the left and right neigh¬ 

bouring change-points to the middle point of {fk-i,fk], and the false negative sensitive 
location error 


K+l 


FNSLE = 


k=l 


with Ik G {1,..., iC -|- 1} such that {rk-i + Tk)/2 G tz^.], see (Futschik et ah 


2014 


Section 3.1), to rate the estimation of the locations of the change-points. We also show 
the mean integrated squared (absolute) error MISE (MIAE) for all methods. 

4.1. Simulation results. In this section we discuss the results of the simulations for 
model (1.2) and (1.3). We start in Table (Supplement [B|) with the simple setting of a 
single change at the midpoint, where we vary the variances on the adjoining segments. In 
Table (Supplement we display results for a constant variance and in Table (Sup¬ 
plement 1^ for heterogeneous errors. We excluded LOOVF from simulations for larger n 
due to its large computation time, confer the run time simulations in Section A.3 in the 
supplement. 

All simulations confirm the overestimation control a for H-SMUCE from Theorem 13.31 


and the exponential decay of the overestimation in Theorem 3.4 The simulations with a 
single change-point confirm that the size of the variance change has no influence, rather 
the size of the variances matters. We found that H-SMUCE performs well compared to all 
other methods. A small a avoids overestimation, but risks to miss changes that are harder 
to detect. Thus, the comparison of the estimates of H-SMUCE for different a shows in 
accordance with our theory that it is reasonable to relax a if changes are expected to be 
harder to detect (recall the discussion in Section 3.4). From the other methods cumSeg 
performs best in the easier and LOOVF in the difficult scenarios, whereby CBS and in 
particular LOOVF shows a tendency to overestimate the number of change-points. 

For a constant signal (corresponding to iP = 0 in Table H-SMUCE overestimates the 
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number of change-points even slightly less than SMUCE, whereas CBS and cumSeg over¬ 
estimate hardly ever. In the case of a constant variance we found that the detection power 
of H-SMUCE is only slightly worse than SMUCE for K = 2, although SMUCE used in¬ 
stead of the dyadic partition V the system of all intervals. The difference is larger for 
= 10 and in this case also CBS and cumSeg performs better than H-SMUCE, since the 
detection power of H-SMUCE depends strongly on the lengths of the constant segments. 
Moreover, Amin plays a similar role as the number of change-points K, since the average 
constant segments length decreases if Amin decreases or K increases. Worse results for 
smaller lengths are due to the familywise error control a of H-SMUCE as it guarantees a 
strict control of overestimating the number of change-points. 

Similar results can be observed for n = 100 with heterogeneous errors. CBS performs 
better than cumSeg and LOOVF, and in particular better than in the single change-point 
setting. CBS outperforms H-SMUCE for K = 5, although H-SMUCE has a much smaller 
tendency to overestimate the number of change-points, whereas in particular CBS and 
LOOVF tend to overestimation. This can also be seen for the MISE and MIAE as these 
measures are much more affected by underestimation than by overestimation. These End¬ 
ings are also supported by the FPSLE and the FNSLE, the FPSLE is heavily afiected by 
overestimation, whereas the FNSLE is larger in case of underestimation. 

In all simulations with heterogeneous errors and 1000 observations H-SMUCE outper¬ 
forms the other methods, for 10 000 observations this becomes even more pronounced. 
In comparison to the simulation with 100 observations the tendency of CBS to overesti¬ 
mate the number of change-points becomes then also more prominent. Finally, in further 
simulations (not displayed) we found that the detection power of all methods decreases 
for smaller C in (d), but all results remain qualitatively the same. All in all, we found 
that H-SMUCE performs well as sample size becomes larger, in particular if the constant 


segments are not too short as indicated by assumption (3.11) in Theorem 3.10 


A comparison of Table and (Supplement B shows that tuned weights increase the 
detection power of H-SMUCE for all significance levels, so we encourage the user to adapt 
the weights if prior information on the scales where changes occur is available. Details 
how the weights are chosen can be found in Section [B.2| in the supplement. 


4.2. Robustness against model violations. We begin by investigating how robust the 
methods are against a violation of the assumption that the standard deviation changes 
only at the same locations as the mean changes. We consider continuous changes as well 
as abrupt changes. The exact functions for the standard deviation can be seen in Figure 
[lOl (Supplement [b|. In Table (Supplement we see that H-SMUCE and CBS perform 
very robust against heterogeneous noise on the constant segments, whereas, remarkably, 
the detection power of cumSeg is even improved. Moreover, in additional simulations (not 
displayed) with less observations we found that LOOVF is very robust, too. 





HETEROGENEOUS CHANGE POINT INFERENCE 


21 


Moreover, we examine robustness against small periodic trends in the mean in simula¬ 


tions similar to those in (Venkatraman et al., 2004), also adapted to the inhomogeneous 


variance. The exact simulation setting can be found in Section B.3| (Supplement B). We 
obtain from Table [^that H-SMUCE shows similar results for small trends compared to the 
simulation without trend for small trends, but is affected by larger trends, in particular 
if these are not scaled by the standard deviation. CBS overestimates heavily in all cases, 
whereas cumSeg (although not affected by the trend) shows over- and underestimation. 
Furthermore, we investigate robustness against heavy tails of the error distribution. In 
Table (Supplement we consider fs-distributed errors which are scaled such that the 
expectation and the standard deviation are the same as in Section 4.1 As expected 


SMUCE is not robust against heavy tails (as it misinterprets extreme values as a change 
in the signal, whereas H-SMUCE provides reasonable results. In comparison to gaussian 
errors H-SMUCE is not influenced for iC = 0, underestimation is more distinct in the 
constant variance scenario and detection power is even increased in the scenario with het¬ 
erogeneous errors. In comparison, CBS is not influenced for K = 0, too, underestimates 
and overestimates in the constant variance scenario and is slightly worse with a tendency 
to underestimation in the scenario with heterogeneous errors, whereas cumSeg overesti¬ 
mates rarely, but heavily for K = 0, underestimates and overestimates in the constant 
variance scenario and is robust in the last scenario. 

In summary, H-SMUCE seems to be robust against a wide range of variance changes on 
constant segments and seems to be only slightly affected by larger tails than gaussian, 
in particular no tendency to overestimation was visible in our simulations. This may 
be explained by the fact that the local likelihood tests of H-SMUCE are quite robust 


against heterogeneous noise, see for instance (Bakirov and Szekely, 2006 Ibragimov and 


Muller, 2010), and against non-normal errors, see (Lehmann and Romano, 2005) and the 


references therein. Unlike the number of change-points, the locations are sometimes miss- 
estimated, since the restricted maximum likelihood estimator is influenced by changes 
of the variance. Instead, more robust estimators, for instance local median and MAD 
estimators, could be used. 

5. Application to ion channel recordings 

In this section we apply H-SMUCE to current recordings of a porin in planar lipid bi- 
layers performed in the Steinem lab (Institute of Organic and Biomolecular Chemistry, 
University of Gottingen). Borins are /3-barrel proteins present in the outer membrane of 


bacteria and in the outer mitochondrial membrane of eukaryotes (Benz, 1994; Schirmer 


1998). Due to their large pore diameter they enable passive diffusion of small solutes like 


ions or sugars. The partial blockade of the pore by an internal loop results in gating that 


can be detected using the voltage clamp technique (Sakmann and Neher, 1995). We aim 
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to detect the gating automatically, since in many ion channel applications hundred or 
more datasets each with several hundredthousands data points have to be analysed. For 
noise reduction the data was automatically preprocessed in the amplifier with an analogue 
four-pole Bessel low-pass filter of 1 kHz. Hence, the noise is coloured, but the correlation 
is less than 10“^ if the trace is subsampled by eleven or more observations, see (Hotz et ah 


2013, (6)), which has been done in the following. Finally, we apply to 882 subsampled 


observations H-SMUCE, CBS, cumSeg and LOOVF. 

In Figure [3a| we see that the signal fluctuates around two or more levels, the so called open 
(higher conductivity, larger current measurements) and closed (lower conductivity, smaller 
current measurements) states. Moreover, the variance in the open states is larger than in 


the closed state, a well known phenomenon denoted as open channel noise (Sakmann and 


Neher (1995, Section 3.4.4) and the references therein) which arises for larger ion channels 


such as porins from conformational fluctuations in the channel protein (Sigworth, 1985). 
Due to the pronounced heterogeneity in the variance, methods which assume a constant 
variance fail to reconstruct the gating, see Figure in the introduction for an illustration. 
In contrast, H-SMUCE at a = 0.05 provides a reasonable fit that covers the main features 
of the data. Additional smaller changes are found by CBS, cumSeg and LOOVF, see 
Figure]^ These changes might be explained by some uncontrollable base line fluctuations 
caused for instance by small holes in the membrane due to movements of the lipids. On 
the other hand, we found in the simulations, see Table and the example in Figure 
(both Supplement 0 . that CBS and LOOVF tend to include small artificial changes, 
whereas we saw in Table (Supplement [B| that H-SMUCE is quite robust against small 
periodic trends in the signal. For illustrative purposes, in order to examine these changes 
further we increase in Figure the significance level a = 0.5 and detect for instance 
changes around 33.0s and 33.2s, too. Taking also the confidence regions of H-SMUCE 
into account confirms several changes with high ’’significance” (e.g. the reconstruction 
of CBS between 33.5 and 33.8) and further changes with less ’’significance” (e.g. the 
changes around 33.0s and 33.2s). Other changes could not be confirmed by H-SMUCE at 
any reasonable significance level (e.g. the peaks of CBS and LOOVF at 33.85). In this 
spirit H-SMUCE can always be used to accompany any segmentation method to help to 
identify its significant changes. Recall, that of course a frequentist statistical error control 
is only given when a is fixed in advance. 
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(a) a = 0.05. 
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(b) a = 0.5. 
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(c) Estimates by CBS (green), cumSeg (purple) and LOOVF (blue). 

Figure 3. Subsampled observations (black points) together with the confidence band 

(grey), the conhdence intervals for the change-point locations (brackets and thick lines), the 
estimated change-points locations (red dashes) and the estimate (red line) by H-SMUCE at 
different a. other estimates. 
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Supplement to 

Heterogeneous Change Point Inference 
BY Florian Pein, Hannes Sieling and Axel Munk 

Appendix A. Computation 


In this section we detail the compntation of the estimator H-SMUCE (Section |A.l ) and 


of the critical valnes qi-, ■ ■ ■ (Section A.2). We also examine the compntation time 
(Section |A.3[ ) theoretically and empirically. An R-package is available onlin^ 

A.l. Computation of the estimator. First of all, we obtain from the multiscale test 
the bounds 


(A.l) 


■= 


Y _ 

-t ij 




j - z + 1' 


j -i + 1 


for fi on the interval [i/n,j/n] G P. Therefore, H-SMUCE can be computed as in (Frick 


et ah, 2014, Section 3) for SMUCE described. However, in what follows we give a modih- 


cation of the algorithm which reduces the computation time remarkably due to the small 
number of intervals 0{n) in the dyadic partition V. Here, we compute first left and right 
limits for the location of the change-points and then start the dynamic program restricted 
to these intervals. A notable difference to (Killick et ah, 2012 Frick et ah| 2014) is that 
this approach leads also to pruning in the forward step of the dynamic program. More 
precisely, we dehne the intersected bounds as 




max and := min 


i<s<t<j 

[s/n^t/n]^T> 


[s / n^t / n]^T> 


and set recursively 


Lfc := min |1 < r < Lk+i - 1 : < ^uLfc+i-i f > 

for fc = A,..., 1, with := n -|- 1. The right limits are dehned as 

Rk := min i < r < n : r } , 


for k = 1,... ,K, with Rq •= 1- In other words, the left limit for the fc-th change-point 
Lfc is the smallest number 1 < r < n such that between W and W a piecewise constant 


solution with K — k change-points exists which respects the bounds (A.l). Analogously, 


the right limit Rk is the smallest number 1 < r < n such that between Yi and W no 


piecewise constant solution with k — 1 change-points exists which fulhls the bounds (A.l). 


Note, that we do not have to compute the right limits separately, since we can just start 
the dynamic program at Lk and stop if another change-point has to be included. It 


'http://www.stochastik.math.uni-goettingen.de/hsmuce 
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follows that the fc-th change-point fk has to be in the conhdence interval [Lk/n, Rk/n], 
since otherwise an additional change-point would be necessary to fulfil the multiscale 
constraints. 


A.2. Computation of the critical values. In this section we show how the critical 
values can be computed by Monte-Carlo simulations. Note hrst that the following method 
uses only the continuity and the monotonicity of the cumulative distribution functions 
of the statistics and therefore the methodology can also be used for other 


multiscale tests, see for instance the extension to other interval sets in Remark 2.2 
Let M be the number of simulations and (Ti_i,... ..., (Ti m, ■ ■ ■ ,Td„,M) be i.i.d. 

copies of the vector (Ti,..., T^^). Moreover, we denote by Fm{-) the empirical distribution 
function of (Ti,..., Td„) and by FM,k{-) the empirical distribution function of the random 
variable T^. Then, we aim to find a vector of critical values ■ ■ ■, QM,d„) which 

satisfies with 


(A.2) 


“ - jg < 1 - -Fm (qw) < a, 


an empirical version of condition (2.3), and with 
1 “ FM,ji{(lM,ji) ^ ~ TM,j2(?Mj2 


(A.3) 


/5 


< 


+ A 

^ M 


31 




for all ji, j 2 e {1,... ,dn}, 


32 


an empirical version of condition (2.5). In the following we propose an iterative method 


to determine such a vector and show afterwards that this vector converges almost surely 


to the vector of critical values dehned by (2.3) and (2.5). As the /c-th entry of the starting 


vector we choose the empirical (1 — a/?fc)-quantile of the statistic T^, since the vector with 


these values satishes condition (A.3) and the inequality 


1 — Fm (■) < tt. 


Afterwards, we reduce the entries until the lower bound from condition (A.2) is satisfied, 
too. To ensure condition (A.3) in every iteration, we always reduce the entry which has 
the smallest ratio 


1 — FM,k{(lM,k) 

I3k 

In Algorithm [T] the determination of the critical values is summarized in pseudocode. 
The method has the advantage that we do not need specific assumptions on the distribu¬ 
tion of the vector (Ti,..., Td„) and still get critical values which are adapted to the exact 
hnite sample distribution of (Ti,..., Td„) and ensure therefore even for a finite number of 
observations the significance level a. 

The following theorem shows the convergence of this algorithm to q = {qi, ■ ■ ■, qd„) ■ 
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Algorithm 1 Determination of the critical values. 


Input: The statistics Ti,... as well as the signihcance level a G (0,1), the weights 
(3d^ > 0, with l^k = 1, and the number of simulations M G N. 

Output: The vector of critical values • • •, qM,dn) which fulhls the conditions 

(A.2) and (A.3). 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 


for z = 1,..., M do 

(Ti,i,..., Td„,i) ^ realisation of (Ti,..., TdJ 

end for 

for k = 1,... ,dn do 

{Sk,i, • • •, Sk,M) ^ sort ((Tfc^i,..., Tk^u)) 

Wk ^ M - [o/dfcMj 

end for 
repeat 

k ^ aigmiiif^^^ d^ 

- 1 

■ ■ 






until 1 — Fi 


M 

'^k^'^k + ^ 

return 


(1 - FmASk^J) 

) Sm,Wd„ ) > 






Theorem A.l (Consitency of Monte-Carlo critical values). The empirical vector of crit¬ 
ical values '(jj^ = (pm,!, ■ ■ ■ ,qM,dn) converges almost surely in the number of simulations 
M to the vector of critical values q= {qi,... ,qd„) defined by (2.3) and (2.5). 


The computation time is dominated by the generation of the M i.i.d. copies of the vector 
(Ti,... ,Td„). Therefore, we store the generated realizations and recycle them. To avoid 
memory problems we only store the realizations for every dyadic number, because the 
signihcance level a is still satished if we determine the critical values based on realizations 
with a larger number of observations, since then the maxima in (Ti,... ,Td„) are taken 
over more intervals. To this end, the choice M = 10 000 seems to be a good trade-off 
between computation time and approximation accuracy. 


A.3. Computatiou time. In this section we discuss the theoretical computation time 
of H-SMUCE and compare it later in simulations with CBS, cumSeg and LOOVF. We 
stress that the computation time for the bounds, for the limits Li,..., (and so for 
K) and for the optimization problem (1.4), and therefore of all conhdence sets, is always 
0{n). Hence, the computation time is dominated by the determination of the restricted 
maximum likelihood estimator by dynamic programming. 


Lemma A.2 (Computation time). The algorithm has data depended computation time 


K-l 


I ^ {Rk ~ Lk -\- l)(i?fc+i — Lk+i + 1) 


(A.4) 
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This can be bounded by 0{'n?) in the worst case, but the computation time is in many 
cases much smaller. In particular, if the signal to noise ratios are large enough such that 
the change-points are easy to detect, i.e. Rk — Lk is small. This is for instance the case for 
a fixed signal, where Rk — Lk stays more or less constant. More precisely, by combining 
(A.4) with equation (3.10) we see that with probability tending to one the computation 
time of H-SMUCE is even linear, if —)■ 0, but \og{{anl3k^,n)~^) —t 0. In comparison 

to the computation time of SMUCE, see (Sieling, 2013, (4.3)), which is dominated by the 
term 

''k-i 


O I {Rk — Rk-i){Rk+i — Rk) 


k=l 


we see that the computation time is further reduced. In particular, if no change-point is 
present the computation time is 0{n) instead of 0(71^). The computation time is also 
0{n) if the number of change-points increases linear in the number of observations and 
the change-points are evenly enough distributed. 

In the following we examine the computation time empirically in a similar simulation 
study as in (Maidstone and Pickering, 2014). More precisely, we generate data with 
varying number of observations n and equidistant change-points. Thereby, we consider 
K = IQ, K = y/n and K = n/lOO. In all scenarios we choose the values of the mean and 
the standard deviation function randomly like in Section]^ once again with C = 200. All 
simulations are repeated 100 times and terminated after ten seconds. The simulations 
were performed on a single core system with 1.8 GHz and 8 GB RAM in a 64-bit OS. 
We fix the significance level a = 0.1 as well as the weights /Si = ■■■ = (3d^ = 1/dn 
and compare H-SMUGE with GBS, LOOVF and cumSeg. Note, that we restore the 
Monte-Garlo simulations at the first use to reduce further loading times, here we only 
take the already restored simulations into account. Furthermore, we set for cumSeg 
the maximal number of change-points k = max(2iP, 10), since for the default parameter 
k = min(30,n/10) the program requires manual increase of k for many simulations runs. 
Note, that the choice above already incorporates prior knowledge about the true signal. 
We stress (not displayed) that the computation time (and the required memory space) 
increases severely in the parameter k. 

From Figure we draw that H-SMUGE is much faster than the other methods, in par¬ 
ticular if the number of change-points increases. For K = n/100 the computation time 
increases almost linearly in the number of observations. For example, when n = 10^ it 
is still less than a minute. The second shortest computation time has GBS for larger 
numbers of observations, whereas cumSeg is superior for smaller numbers of observations. 
The computation time of GBS for n = 10^ observations is still less than a minute in all 
scenarios, whereas cumSeg has a similar computation time for K = 10, but lasts several 
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(a) K = 10. 
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(c) K = n/100. 

Figure 8. Mean computation time of H-SMUCE (red crosses), CBS (green triangles), 
cumSeg (purple circles) and LOOVF (blue squares) for different number of observations n 
and different number of change-points K. Note that for purposes of visualization the x-axis 
is displayed non-equidistantly. 
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minutes in the other cases. Lastly, LOOVF exceeds ten seconds already for n = 400 
observations and is always found to be the slowest method. 

Appendix B. Additional Figures and Tables 


In this section we collect additional figures and tables. 


B.l. Simulations. We start with estimates by CBS, cumSeg and LOOVF for the data 
from Figure 

The following three tables collect the results of the simulations in Section |4.1[ Recall the 
random pair (/i^, G S (all random variables are independent from each other): 

(a) We fix the number of observations n, the number of change-points K, a constant 
C and a minimum value for the smallest scale Amin- 

(b) We draw the locations of the change-points tq := 0 < ri < ■ • • < Ti^ < 1 =: tk+i 
uniformly distributed with the restriction that A := minfc=o,...,A lufc+i — Tfc| > Amin- 

(c) We choose the function values sq, - - -, of the standard deviation function aR by 
Sk := 2^'=, where Uq, ..., Ur are uniform distributed on [—2, 2]. 

(d) We determine the function values mg,..., mx of the signal ^r such that 


\mk - mk-i\ = 



A+i A A A—1 


-1 


Vfc = 1,...,A. 


'^k-1 

Thereby, we start with mg = 0 and choose randomly with probability 1/2 whether 
the expectation increases or decreases. 


All simulations are repeated 10 000 times. 


B.2. Prior information on scales. To demonstrate the effect of incorporating prior 
knowledge about those scales where change-points are likely to happen we consider again 
the observations from Table with n = 10 000, A = 10 and Amin = 50. To this end, we 
use the adapted weights, where we eliminate the smallest three scales k = 1, 2, 3, since 
all constant segments contain at least 50 observations and therefore these small scales are 
not needed for detection. Moreover, we choose /d 4 = 1/4, (5^ = 1/4, (3^ = 1/6, (i-j = 1/6, 
/^s = 1/12, /3g = 1/12 in decreasing order, since change-points on smaller scales are more 
likely and harder to detect. For the same reasons we eliminate the four largest scales 
k = 10,11,12,13, too. 

A comparison of Table and shows that the modihed weights increase the detection 
power of H-SMUCE for all significance levels, so we encourage the user to adapt the 
weights if prior information on the scales where changes occur is available. 
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(a) CBS. 
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(b) cumSeg. 
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(c) LOOVF. 

Figure 9. Observations (black points) and true signal (black line) together with estimates 
by CBS, cumSeg and LOOVF for the data from Figure All parameters are chosen as 
described in Section 
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Setting 

Method 

-1 

0 

-LI 

> -L2 

\K-K\ 

FPSLE 

FNSLE 

MISE 

MIAE 

o-Q 

= 0.5, 

HS(O.l) 

0.000 

0.995 

0.004 

0.000 

0.005 

0.82 

0.74 

0.0119 

0.0644 

CJ-1 

= 0.5, 

HS(0.3) 

0.000 

0.975 

0.025 

0.000 

0.025 

1.38 

0.95 

0.0129 

0.0672 



HS(0.5) 

0.000 

0.929 

0.070 

0.001 

0.072 

2.67 

1.40 

0.0144 

0.0706 



CBS 

0.000 

0.949 

0.036 

0.015 

0.066 

2.31 

0.94 

0.0128 

0.0660 



cumSeg 

0.000 

0.995 

0.005 

0.000 

0.005 

1.37 

1.28 

0.0172 

0.0707 



LOOVF 

0.000 

0.774 

0.142 

0.084 

0.378 

10.10 

2.42 

0.1402 

0.2897 

0-0 

= 0.5, 

HS(O.l) 

0.112 

0.886 

0.002 

0.000 

0.114 

3.99 

6.77 

0.0543 

0.1405 

CJ-1 

= 1. 

HS(0.3) 

0.020 

0.961 

0.019 

0.000 

0.039 

2.38 

2.56 

0.0321 

0.1086 



HS(0.5) 

0.005 

0.940 

0.054 

0.001 

0.061 

3.12 

2.30 

0.0314 

0.1090 



CBS 

0.042 

0.873 

0.068 

0.017 

0.147 

5.87 

4.93 

0.0496 

0.1315 



cumSeg 

0.008 

0.969 

0.021 

0.003 

0.034 

3.09 

2.78 

0.0375 

0.1126 



LOOVF 

0.006 

0.791 

0.112 

0.091 

0.373 

11.30 

3.90 

0.1720 

0.3004 

0-0 

= 0.5, 

HS(O.l) 

0.484 

0.515 

0.001 

0.000 

0.485 

12.77 

24.89 

0.1736 

0.3110 

CJ-1 

= 1.5, 

HS(0.3) 

0.209 

0.778 

0.012 

0.000 

0.222 

6.81 

11.92 

0.1025 

0.2075 



HS(0.5) 

0.089 

0.872 

0.039 

0.000 

0.129 

4.92 

6.54 

0.0725 

0.1690 



CBS 

0.417 

0.454 

0.105 

0.024 

0.577 

17.63 

25.40 

0.1845 

0.3385 



cumSeg 

0.231 

0.731 

0.032 

0.006 

0.276 

9.60 

14.62 

0.1149 

0.2317 



LOOVF 

0.135 

0.683 

0.098 

0.085 

0.490 

15.78 

11.92 

0.2307 

0.3322 

0-0 

= 1, 

HS(O.l) 

0.453 

0.547 

0.001 

0.000 

0.453 

13.49 

24.97 

0.1514 

0.3140 

CJ-1 

= 1, 

HS(0.3) 

0.171 

0.818 

0.011 

0.000 

0.182 

8.11 

12.53 

0.0942 

0.2170 



HS(0.5) 

0.062 

0.900 

0.038 

0.000 

0.101 

6.75 

7.99 

0.0745 

0.1847 



CBS 

0.156 

0.744 

0.091 

0.008 

0.265 

9.51 

11.41 

0.0943 

0.2127 



cumSeg 

0.120 

0.876 

0.004 

0.000 

0.124 

5.93 

8.88 

0.0748 

0.1839 



LOOVF 

0.039 

0.749 

0.132 

0.081 

0.405 

13.29 

6.87 

0.1947 

0.3472 

0-0 

= 1, 

HS(O.l) 

0.727 

0.272 

0.000 

0.000 

0.728 

19.44 

37.71 

0.2237 

0.4244 

CJ-1 

= 1.5, 

HS(0.3) 

0.410 

0.584 

0.006 

0.000 

0.416 

13.35 

23.77 

0.1644 

0.3256 



HS(0.5) 

0.218 

0.753 

0.028 

0.000 

0.247 

10.25 

15.64 

0.1283 

0.2669 



CBS 

0.491 

0.406 

0.096 

0.008 

0.604 

18.00 

28.42 

0.2013 

0.3741 



cumSeg 

0.409 

0.580 

0.010 

0.000 

0.420 

12.91 

22.99 

0.1571 

0.3155 



LOOVF 

0.184 

0.638 

0.105 

0.072 

0.501 

16.55 

14.92 

0.2410 

0.3626 

O-Q 

= 1.5, 

HS(O.l) 

0.844 

0.156 

0.000 

0.000 

0.844 

22.41 

43.65 

0.2581 

0.4713 

O’! 

= 1.5, 

HS(0.3) 

0.574 

0.423 

0.003 

0.000 

0.577 

18.21 

33.12 

0.2219 

0.4101 



HS(0.5) 

0.352 

0.629 

0.018 

0.000 

0.371 

15.47 

25.01 

0.1915 

0.3582 



CBS 

0.659 

0.258 

0.079 

0.003 

0.746 

20.73 

35.81 

0.2449 

0.4379 



cumSeg 

0.629 

0.369 

0.002 

0.000 

0.631 

17.56 

33.32 

0.2147 

0.4067 



LOOVF 

0.297 

0.534 

0.104 

0.066 

0.589 

19.34 

21.26 

0.2715 

0.4046 


Table 1. Simulations with a single change (fixed signal and variances): n = 100 observa¬ 
tions and a single change at 0.5, from 0 to 1 for different standard deviations changing from 
(To to (Ti at 0.5, too. Columns from left to right: setting, method, proportions of K — K and 
averages of the corresponding error criteria. HS(a) denotes H-SMUCE at significance level 

a. 


B.3. Robustness. Figure IT shows the standard deviation functions in Table to ex¬ 
amine robustness against variance changes on constant segments. We consider the sinus¬ 
shaped standard deviation ai (continuous changes), the piecewise linear standard devi¬ 
ation (J 2 (continuous and abrupt changes at the same time) and the piecewise constant 
standard deviation (abrupt changes). Moreover, we analyse in Table robustness 
against small periodic trends in simulations similar to those in (Venkatraman et ah, 2004). 
More precisely, we generate the random pairs (/ij?, G 5 as in (a)-(d) described, but 
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Setting 

Method 

< -2 

-1 

0 

+1 

> +2 

\k-K\ 

FPSLE 

FNSLE 

MISE 

MIAE 

n = 1000, 

HS(O.l) 

- 

- 

0.965 

0.035 

0.000 

0.035 

17.75 

4.73 

0.0035 

0.0365 

K = 0, 

HS(0.3) 

- 

- 

0.867 

0.128 

0.005 

0.138 

68.95 

18.42 

0.0045 

0.0401 

o' 

III 

a; 

a. 

II 

a. 

HS(0.5) 

- 

- 

0.719 

0.256 

0.025 

0.307 

153.45 

41.25 

0.0061 

0.0454 

a = GR 

S(O.l) 

- 

- 

0.965 

0.034 

0.001 

0.036 

17.90 

5.03 

0.0039 

0.0371 

= const 

S(0.3) 

- 

- 

0.832 

0.160 

0.008 

0.177 

88.45 

24.80 

0.0059 

0.0435 


S(0.5) 

- 

- 

0.667 

0.298 

0.035 

0.370 

184.90 

50.94 

0.0082 

0.0499 


CBS 

- 

- 

0.991 

0.000 

0.009 

0.018 

8.90 

1.26 

0.0037 

0.0351 


cumSeg 

- 

- 

0.999 

0.001 

0.000 

0.001 

0.30 

0.06 

0.0029 

0.0345 

n = 1000, 

HS(O.l) 

0.010 

0.174 

0.802 

0.014 

0.000 

0.208 

26.32 

72.66 

0.0132 

0.0613 

K = 2, 

HS(0.3) 

0.004 

0.108 

0.819 

0.067 

0.002 

0.187 

38.10 

52.90 

0.0114 

0.0571 

^min — 30, 

HS(0.5) 

0.002 

0.070 

0.768 

0.150 

0.010 

0.244 

64.14 

48.50 

0.0111 

0.0573 

M = m, 

S(O.l) 

0.003 

0.074 

0.912 

0.011 

0.000 

0.092 

16.96 

34.03 

0.0092 

0.0513 

a = 1 

S(0.3) 

0.001 

0.040 

0.892 

0.065 

0.002 

0.112 

32.24 

27.39 

0.0090 

0.0513 


S(0.5) 

0.001 

0.025 

0.806 

0.155 

0.013 

0.209 

63.30 

32.33 

0.0095 

0.0536 


CBS 

0.005 

0.060 

0.821 

0.082 

0.033 

0.221 

37.55 

37.57 

0.0111 

0.0527 


cumSeg 

0.025 

0.116 

0.749 

0.099 

0.011 

0.289 

65.32 

82.63 

0.0364 

0.0738 

n = 1000, 

HS(O.l) 

0.009 

0.160 

0.815 

0.015 

0.000 

0.194 

27.14 

68.91 

0.0127 

0.0611 

K = 2, 

HS(0.3) 

0.004 

0.098 

0.829 

0.067 

0.001 

0.176 

37.77 

49.63 

0.0111 

0.0572 

'^min — 50, 

HS(0.5) 

0.002 

0.063 

0.774 

0.152 

0.009 

0.237 

63.46 

46.06 

0.0109 

0.0573 

M = m, 

S(O.l) 

0.003 

0.068 

0.919 

0.009 

0.000 

0.084 

16.82 

31.94 

0.0091 

0.0515 

cr = 1 

S(0.3) 

0.001 

0.035 

0.899 

0.063 

0.002 

0.104 

31.19 

25.81 

0.0090 

0.0515 


S(0.5) 

0.001 

0.020 

0.819 

0.147 

0.013 

0.195 

59.86 

30.23 

0.0095 

0.0537 


CBS 

0.005 

0.058 

0.824 

0.083 

0.031 

0.215 

37.50 

36.27 

0.0112 

0.0532 


cumSeg 

0.023 

0.110 

0.769 

0.090 

0.008 

0.262 

59.74 

79.25 

0.0336 

0.0741 

n = 1000, 

HS(O.l) 

0.508 

0.330 

0.161 

0.001 

0.000 

1.634 

54.37 

172.66 

0.1112 

0.1842 

K = 10, 

HS(0.3) 

0.354 

0.377 

0.263 

0.006 

0.000 

1.233 

44.53 

127.81 

0.0817 

0.1561 

^min — 30, 

HS(0.5) 

0.253 

0.384 

0.346 

0.017 

0.000 

0.987 

40.88 

102.88 

0.0679 

0.1419 

M = Rfl, 

S(O.l) 

0.163 

0.352 

0.485 

0.001 

0.000 

0.721 

29.14 

77.49 

0.0424 

0.1193 

a = 1 

S(0.3) 

0.093 

0.301 

0.598 

0.007 

0.000 

0.513 

24.23 

56.17 

0.0366 

0.1099 


S(0.5) 

0.062 

0.258 

0.657 

0.022 

0.001 

0.415 

23.34 

46.37 

0.0342 

0.1060 


CBS 

0.033 

0.129 

0.531 

0.204 

0.102 

0.644 

42.69 

45.08 

0.0417 

0.1078 


cumSeg 

0.163 

0.216 

0.403 

0.165 

0.053 

0.904 

65.16 

105.59 

0.1107 

0.1492 

n = 1000, 

HS(O.l) 

0.445 

0.356 

0.198 

0.001 

0.000 

1.474 

59.32 

162.03 

0.0913 

0.1801 

K = 10, 

HS(0.3) 

0.303 

0.384 

0.307 

0.005 

0.000 

1.104 

47.34 

120.10 

0.0682 

0.1532 

'^min — 50, 

HS(0.5) 

0.213 

0.379 

0.390 

0.018 

0.001 

0.881 

41.98 

96.70 

0.0577 

0.1398 

M = m, 

S(O.l) 

0.155 

0.351 

0.494 

0.000 

0.000 

0.697 

32.51 

77.29 

0.0426 

0.1235 

a = 1 

S(0.3) 

0.085 

0.299 

0.612 

0.004 

0.000 

0.485 

26.14 

55.78 

0.0368 

0.1131 


S(0.5) 

0.054 

0.252 

0.680 

0.014 

0.000 

0.381 

23.81 

45.39 

0.0344 

0.1086 


CBS 

0.027 

0.135 

0.524 

0.203 

0.111 

0.653 

45.64 

44.88 

0.0425 

0.1116 


cumSeg 

0.165 

0.217 

0.389 

0.179 

0.050 

0.904 

63.73 

104.37 

0.1037 

0.1522 


Table 2. Simulations with constant variance and C = 200. Columns from left to right: 
setting, method, proportions of K — K and averages of the corresponding error criteria. 
HS(a) and S(a) denote H-SMUCE and SMUCE at signihcance level a, respectively. 


replace the signal fiR by 

UTii/n) = Hr + 6sin(a7ri) 

and + baR^i/n) sin(a7ri) + h{aR{i/n) — crj?((i — f)/n) sin(a7ri), 

A = 1,... ,n, 

respectively. The signal ht reflects the sitnation of a hxed periodic trend, whereas in ht^ 
the trend is scaled by the local standard deviation. The last term corrects the size of 


changes such that still aR determines the changes. We consider as in (Venkatraman et al. 
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Setting 

Method 

< -2 

-1 

0 

+1 

> +2 

\k-K\ 

FPSLE 

FNSLE 

MISE 

MIAE 

n = 

100, 

HS(O.l) 

0.000 

0.125 

0.873 

0.002 

0.000 

0.128 

1.51 

4.07 

0.8182 

0.3308 

K = 

2, 

HS(0.3) 

0.000 

0.042 

0.945 

0.013 

0.000 

0.055 

1.04 

1.70 

0.4217 

0.2482 

-^min 

, = 15, 

HS(0.5) 

0.000 

0.016 

0.940 

0.043 

0.000 

0.060 

1.63 

1.26 

0.2776 

0.2291 


fJ-R, 

CBS 

0.000 

0.001 

0.925 

0.058 

0.016 

0.092 

2.03 

0.79 

0.2220 

0.2143 

a = 


cumSeg 

0.000 

0.066 

0.720 

0.167 

0.047 

0.343 

6.50 

4.39 

0.4898 

0.3053 



LOOVF 

0.000 

0.031 

0.700 

0.163 

0.106 

0.683 

12.83 

3.36 

0.3167 

0.2639 

n = 

100, 

HS(O.l) 

0.608 

0.364 

0.028 

0.000 

0.000 

1.610 

13.51 

32.33 

9.5104 

1.8626 

K = 

5, 

HS(0.3) 

0.212 

0.577 

0.211 

0.000 

0.000 

1.003 

8.63 

19.80 

6.5362 

1.3263 

-^min 

, = 15, 

HS(0.5) 

0.061 

0.466 

0.473 

0.001 

0.000 

0.588 

5.27 

11.65 

3.9992 

0.9047 


iJ-R, 

CBS 

0.001 

0.008 

0.884 

0.089 

0.018 

0.137 

1.65 

1.02 

0.4539 

0.3130 

a = 

CTR 

cumSeg 

0.098 

0.230 

0.544 

0.117 

0.012 

0.588 

6.93 

12.13 

1.2454 

0.5441 



LOOVF 

0.031 

0.112 

0.520 

0.152 

0.184 

1.648 

14.61 

6.92 

0.5887 

0.4042 

n = 

1000, 

HS(O.l) 

0.000 

0.007 

0.974 

0.018 

0.000 

0.026 

8.42 

5.83 

0.0195 

0.0617 

K = 

2, 

HS(0.3) 

0.000 

0.001 

0.921 

0.075 

0.002 

0.080 

24.72 

9.57 

0.0193 

0.0636 

-^min 

, = 30, 

HS(0.5) 

0.000 

0.000 

0.827 

0.162 

0.012 

0.185 

53.23 

17.09 

0.0204 

0.0668 


iJ-R, 

CBS 

0.005 

0.019 

0.774 

0.146 

0.056 

0.298 

52.95 

21.17 

0.0347 

0.0711 

a = 

(^R 

cumSeg 

0.022 

0.161 

0.683 

0.103 

0.030 

0.387 

64.04 

92.66 

0.0765 

0.1112 

n = 

1000, 

HS(O.l) 

0.000 

0.002 

0.982 

0.017 

0.000 

0.018 

7.25 

4.35 

0.0182 

0.0630 

K = 

2, 

HS(0.3) 

0.000 

0.000 

0.926 

0.071 

0.002 

0.076 

22.64 

8.49 

0.0196 

0.0657 

-^min 

, = 50, 

HS(0.5) 

0.000 

0.000 

0.830 

0.160 

0.010 

0.181 

50.02 

16.22 

0.0214 

0.0692 


tJ-R, 

CBS 

0.003 

0.011 

0.776 

0.153 

0.057 

0.296 

53.69 

15.85 

0.0355 

0.0730 

a = 

^R 

cumSeg 

0.016 

0.155 

0.699 

0.098 

0.031 

0.370 

60.63 

84.69 

0.0739 

0.1132 

n = 

1000, 

HS(O.l) 

0.123 

0.429 

0.446 

0.002 

0.000 

0.686 

22.83 

55.06 

0.4045 

0.2402 

K = 

10, 

HS(0.3) 

0.016 

0.199 

0.770 

0.015 

0.000 

0.245 

11.98 

21.12 

0.1863 

0.1618 

-^min 

, = 30, 

HS(0.5) 

0.002 

0.088 

0.863 

0.045 

0.001 

0.140 

11.84 

12.71 

0.1220 

0.1404 

M = 

fJ^R, 

CBS 

0.002 

0.008 

0.463 

0.316 

0.211 

0.843 

47.26 

15.20 

0.1274 

0.1435 

a = 

^R 

cumSeg 

0.439 

0.243 

0.187 

0.085 

0.046 

1.674 

94.91 

228.44 

0.3120 

0.2806 

n = 

1000, 

HS(O.l) 

0.025 

0.262 

0.711 

0.002 

0.000 

0.315 

16.94 

32.39 

0.2102 

0.1866 

K = 

10, 

HS(0.3) 

0.002 

0.058 

0.925 

0.015 

0.000 

0.076 

8.46 

10.58 

0.1009 

0.1372 

-^min 

, = 50, 

HS(0.5) 

0.000 

0.017 

0.940 

0.043 

0.001 

0.061 

9.03 

7.72 

0.0860 

0.1307 

M == 

iJ-R, 

CBS 

0.001 

0.007 

0.451 

0.319 

0.222 

0.868 

47.81 

15.10 

0.1293 

0.1463 

a = 


cumSeg 

0.433 

0.254 

0.197 

0.082 

0.035 

1.601 

97.00 

223.47 

0.2771 

0.2794 

n = 

10000, 

HS(O.l) 

0.000 

0.004 

0.983 

0.013 

0.000 

0.017 

50.65 

30.94 

0.0016 

0.0183 

K = 

2, 

HS(0.3) 

0.000 

0.002 

0.936 

0.061 

0.001 

0.065 

188.73 

63.72 

0.0016 

0.0188 

-^min 

11 

CO 

p 

HS(0.5) 

0.000 

0.001 

0.865 

0.128 

0.006 

0.142 

407.41 

125.46 

0.0016 

0.0197 



CBS 

0.012 

0.036 

0.532 

0.200 

0.220 

0.886 

1548.96 

373.22 

0.0057 

0.0235 

a = 

^R 

cumSeg 

0.054 

0.245 

0.600 

0.084 

0.017 

0.477 

682.64 

1457.08 

0.0090 

0.0379 

n = 

10000, 

HS(O.l) 

0.000 

0.001 

0.984 

0.015 

0.000 

0.016 

53.23 

24.89 

0.0014 

0.0182 

K = 

2, 

HS(0.3) 

0.000 

0.000 

0.941 

0.057 

0.002 

0.060 

181.06 

59.83 

0.0014 

0.0188 

-^min 

, = 50, 

HS(0.5) 

0.000 

0.000 

0.870 

0.124 

0.007 

0.137 

394.16 

115.62 

0.0016 

0.0197 


tJ-R, 

CBS 

0.012 

0.035 

0.521 

0.208 

0.225 

0.917 

1601.54 

366.42 

0.0058 

0.0238 

a = 


cumSeg 

0.052 

0.241 

0.603 

0.087 

0.016 

0.473 

673.81 

1430.47 

0.0084 

0.0377 

n = 

10000, 

HS(O.l) 

0.023 

0.231 

0.741 

0.005 

0.000 

0.282 

58.42 

165.72 

0.0178 

0.0431 

K = 

10, 

HS(0.3) 

0.006 

0.123 

0.844 

0.027 

0.000 

0.162 

68.27 

98.25 

0.0122 

0.0385 

-^min 

, = 30, 

HS(0.5) 

0.003 

0.079 

0.854 

0.064 

0.002 

0.151 

108.19 

87.63 

0.0103 

0.0377 

M = 

iJ-R, 

CBS 

0.024 

0.043 

0.180 

0.222 

0.531 

2.088 

1286.59 

525.95 

0.0198 

0.0475 

a = 

ctr 

cumSeg 

0.619 

0.169 

0.130 

0.059 

0.024 

2.345 

1000.55 

3122.28 

0.0433 

0.0917 

n = 

10000, 

HS(O.l) 

0.009 

0.165 

0.819 

0.007 

0.000 

0.190 

59.11 

124.05 

0.0132 

0.0418 

K = 

10, 

HS(0.3) 

0.001 

0.064 

0.905 

0.029 

0.001 

0.097 

67.32 

65.54 

0.0089 

0.0375 

-^min 

, = 50, 

HS(0.5) 

0.000 

0.029 

0.900 

0.067 

0.003 

0.102 

103.42 

60.04 

0.0078 

0.0368 

M = 

Mfl, 

CBS 

0.019 

0.034 

0.162 

0.228 

0.557 

2.203 

1317.31 

467.47 

0.0198 

0.0475 

a = 

^R 

cumSeg 

0.607 

0.188 

0.131 

0.051 

0.023 

2.277 

997.64 

3105.88 

0.0405 

0.0925 

n = 

10000, 

HS(O.l) 

0.609 

0.284 

0.107 

0.001 

0.000 

1.908 

155.65 

504.02 

0.1016 

0.1031 

K = 

25, 

HS(0.3) 

0.278 

0.399 

0.318 

0.006 

0.000 

1.044 

94.53 

263.30 

0.0640 

0.0789 

-^min 

CO 

p 

HS(0.5) 

0.140 

0.371 

0.470 

0.019 

0.000 

0.696 

84.07 

182.54 

0.0483 

0.0703 

M == 

fJ-R, 

CBS 

0.015 

0.024 

0.069 

0.128 

0.765 

3.348 

921.91 

409.98 

0.0411 

0.0723 

a = 

CTR 

cumSeg 

0.934 

0.036 

0.018 

0.009 

0.003 

6.028 

1043.82 

3488.43 

0.1159 

0.1540 

n = 

10000, 

HS(O.l) 

0.396 

0.383 

0.220 

0.001 

0.000 

1.334 

146.74 

387.66 

0.0699 

0.0945 

K = 

25, 

HS(0.3) 

0.103 

0.359 

0.528 

0.010 

0.000 

0.591 

85.33 

175.03 

0.0390 

0.0715 

-^min 

, = 50, 

HS(0.5) 

0.038 

0.241 

0.690 

0.030 

0.001 

0.352 

78.74 

114.01 

0.0291 

0.0647 

M = 

fJ^R, 

CBS 

0.010 

0.017 

0.055 

0.120 

0.799 

3.529 

934.29 

346.33 

0.0405 

0.0726 

a = 

ctr 

cumSeg 

0.934 

0.036 

0.019 

0.008 

0.003 

5.849 

1053.35 

3462.62 

0.1022 

0.1547 


Table 3. Simulations with heterogeneous errors and C = 200. Columns from left to right: 
setting, method, proportions oi K — K and averages of the corresponding error criteria. 
HSIa-I dpnntps H-SIVITTCF, at, sifmifipa.nf'p IpvpI a 
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Method 

CSI 

1 

VI 

-1 

0 

+1 

> +2 

\K-K\ 

FPSLE 

FNSLE 

MISE 

MIAE 

HS(O.l) 

0.005 

0.117 

0.876 

0.002 

0.000 

0.130 

50.82 

113.50 

0.0107 

0.0406 

HS(0.3) 

0.000 

0.032 

0.952 

0.016 

0.000 

0.049 

48.39 

49.84 

0.0075 

0.0368 

HS(0.5) 

0.000 

0.013 

0.940 

0.045 

0.001 

0.061 

78.86 

48.19 

0.0072 

0.0368 


Table 4. n = 10 000 observations, K = change-points, C = 200 and Amin = 50 from 
Table Columns from left to right: setting, method, proportions oi K — K and averages of 
the corresponding error criteria. HS(a) denotes H-SMUCE at significance level a, but with 
weights ^ 4 ,... ,Pg. 


2004) long (a 


0.01) and short (a 


distributed errors. 


0.025) trends. Finally, Table reports result of 



(a) (b) (c) 


Figure 10. Continuous sinus-shaped standard deviation o'i(t) := 

1 -|- 0.5sin(207rt). Piecewise linear standard deviation cT 2 (t) := 0.5 -|- 

Z)Lo (^0^ “ Ol(o.n,o.i(*+i)](^)- B Piecewise constant standard deviation cr 3 (t) := 

0.51(2oo(i-i)/n, 200(i-i)/n+ioo/n](^) + l(200(i-i)/n+ioo/n, 200 i/n]it)j exemplary for 
n = 1000. 


Appendix C. Proofs 


In this section we collect the proofs together with some auxiliary statements. 


C.l. Proof of Lemma 12.11 


Proof of Lemma A single statistic T-{Z, 0) has the c.d.f. Fij_i(-) of an F-distribution 
with (1, j — i) degrees of freedom. Thus, Ffc(-) = is continuous and strictly 

monotonically increasing for positive arguments. Now, it follows from equation (2.5) that 


(C.l) 


Qk = F, 


= 2 d 


This together with equation (2.3) yields 

n(9i)) 


G(,i):=rUri,r2-Mi 








a. 


Note, that F is continuous and hmqj,_>o F {qi,..., qd„) = 0 for all k = 1,..., as well as 
F {qi,..., qd„) = 1. Thus, the function G is continuous, strictly monotoni¬ 
cally increasing on [0, oo) and attains all values in [0,1). Therefore, the existence of the 
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Setting 

Method 

< -2 

-1 

0 

+1 

> +2 

\k-K\ 

FPSLE 

FNSLE 

MISE 

MIAE 

n = 1000, 

HS(O.l) 

- 

- 

0.968 

0.032 

0.000 

0.033 

16.30 

4.12 

0.0013 

0.0277 

K = 0, 

HS(0.3) 

- 

- 

0.876 

0.118 

0.005 

0.129 

64.60 

15.91 

0.0018 

0.0306 

o' 

III 

a; 

a. 

II 

a. 

HS(0.5) 

- 

- 

0.734 

0.239 

0.027 

0.293 

146.75 

36.45 

0.0023 

0.0338 

a = (Ji 

CBS 

- 

- 

0.916 

0.001 

0.083 

0.186 

93.25 

11.21 

0.0045 

0.0288 


cumSeg 

- 

- 

1.000 

0.000 

0.000 

0.000 

0.20 

0.04 

0.0011 

0.0264 

n = 1000, 

HS(O.l) 

- 

- 

0.968 

0.031 

0.001 

0.032 

16.10 

4.12 

0.0013 

0.0278 

K = 0, 

HS(0.3) 

- 

- 

0.876 

0.118 

0.005 

0.129 

64.55 

15.73 

0.0017 

0.0306 

o" 

III 

a 

a. 

II 

a. 

HS(0.5) 

- 

- 

0.734 

0.241 

0.024 

0.292 

145.80 

35.28 

0.0022 

0.0340 

a = a2 

CBS 

- 

- 

0.937 

0.004 

0.060 

0.135 

67.70 

8.96 

0.0034 

0.0281 


cumSeg 

- 

- 

0.999 

0.001 

0.000 

0.001 

0.40 

0.12 

0.0011 

0.0264 

n = 1000, 

HS(O.l) 

- 

- 

0.969 

0.030 

0.001 

0.032 

15.75 

3.91 

0.0007 

0.0210 

K = 0, 

HS(0.3) 

- 

- 

0.875 

0.119 

0.006 

0.130 

65.10 

16.31 

0.0009 

0.0227 

o" 

III 

a 

a. 

II 

a. 

HS(0.5) 

- 

- 

0.737 

0.236 

0.026 

0.290 

145.15 

36.09 

0.0012 

0.0250 

CT = 0-3 

CBS 

- 

- 

0.937 

0.002 

0.061 

0.134 

67.10 

8.64 

0.0019 

0.0213 


cumSeg 

- 

- 

0.999 

0.001 

0.000 

0.001 

0.35 

0.10 

0.0006 

0.0199 

n = 10000, 

HS(O.l) 

0.013 

0.185 

0.796 

0.005 

0.000 

0.218 

661.16 

755.83 

0.0212 

0.0684 

K = 10, 

HS(0.3) 

0.003 

0.076 

0.890 

0.031 

0.001 

0.113 

543.91 

548.21 

0.0167 

0.0585 

'^min ~ 50, 

HS(0.5) 

0.001 

0.041 

0.886 

0.069 

0.003 

0.117 

513.55 

468.37 

0.0147 

0.0542 

M 

CBS 

0.000 

0.001 

0.191 

0.155 

0.653 

2.636 

1590.35 

276.51 

0.0092 

0.0358 

a = ai 

cumSeg 

0.206 

0.118 

0.413 

0.193 

0.070 

0.984 

790.10 

1054.73 

0.0146 

0.0502 

n = 10000, 

HS(O.l) 

0.014 

0.205 

0.776 

0.006 

0.000 

0.238 

421.19 

513.32 

0.0156 

0.0556 

K = 10, 

HS(0.3) 

0.001 

0.077 

0.894 

0.027 

0.001 

0.108 

348.50 

358.14 

0.0119 

0.0475 

'^min ~ 50, 

HS(0.5) 

0.000 

0.038 

0.897 

0.062 

0.002 

0.105 

344.93 

311.35 

0.0106 

0.0446 


CBS 

0.000 

0.000 

0.215 

0.174 

0.611 

2.362 

1454.85 

247.26 

0.0085 

0.0346 

(7 = (J2 

cumSeg 

0.114 

0.102 

0.467 

0.236 

0.082 

0.795 

756.12 

720.95 

0.0136 

0.0478 

n = 10000, 

HS(O.l) 

0.019 

0.233 

0.744 

0.004 

0.000 

0.276 

161.27 

251.06 

0.0053 

0.0301 

K = 10, 

HS(0.3) 

0.002 

0.069 

0.904 

0.025 

0.000 

0.099 

137.86 

136.79 

0.0036 

0.0254 

'^min ~ 50, 

HS(0.5) 

0.000 

0.029 

0.906 

0.062 

0.003 

0.096 

170.29 

128.56 

0.0033 

0.0248 

M = m, 

CBS 

0.000 

0.000 

0.246 

0.173 

0.582 

2.189 

1134.85 

214.71 

0.0047 

0.0263 

CT = fT3 

cumSeg 

0.054 

0.051 

0.516 

0.279 

0.101 

0.669 

749.33 

499.10 

0.0070 

0.0346 


Table 5. Simulations with standard deviations (Ti(-)-(T 3 (-) from Figure 10 and C = 200. 


Columns from left to right: setting, method, proportions oi K — K and averages of the 
corresponding error criteria. HS(a) denotes H-SMUCE at significance level a. 


vector of critical values follows from the intermediate value theorem and the vector is also 
unique. □ 


C.2. Proof of 

statistic Tfc has 


Lemma |3.1| . First of all, recall from the 
c.d.f. every fc = 1,..., 


proof of Lemma 2T that the 
we use the transformation 




and the identity 

T, = (t/f, 

Here, (■) denotes the quantile function of an F-distribution with (1, 2*^ — 1) degrees 

of freedom. Analogously, we dehne 

Pd 


(lk,U ■= -^1,2'=-! {(Ik) 
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Setting 

Method 

< -2 

-1 

0 

+1 

> +2 

\k-K\ 

FPSLE 

FNSLE 

MISE 

MIAE 

M = MT, 

HS(O.l) 

0.137 

0.421 

0.439 

0.003 

0.000 

0.709 

24.74 

58.06 

0.3976 

0.2449 

a — 0.01, 

HS(0.3) 

0.019 

0.209 

0.741 

0.032 

0.000 

0.279 

15.65 

24.13 

0.1814 

0.1681 

b = 0.1, 

HS(0.5) 

0.003 

0.091 

0.822 

0.081 

0.003 

0.184 

17.32 

15.63 

0.1206 

0.1474 

a ^ an 

CBS 

0.001 

0.011 

0.383 

0.290 

0.315 

1.141 

72.93 

22.14 

0.1321 

0.1535 


cumSeg 

0.443 

0.237 

0.192 

0.085 

0.043 

1.663 

95.08 

225.57 

0.3080 

0.2823 

M = iJ-T, 

HS(O.l) 

0.149 

0.360 

0.370 

0.104 

0.016 

0.821 

73.52 

94.89 

0.4410 

0.3070 

a = 0.01, 

HS(0.3) 

0.029 

0.181 

0.496 

0.226 

0.067 

0.611 

78.00 

62.38 

0.2304 

0.2376 

6 = 0.3, 

HS(0.5) 

0.007 

0.092 

0.466 

0.306 

0.129 

0.697 

88.73 

53.42 

0.1595 

0.2169 

a ^ an 

CBS 

0.001 

0.006 

0.082 

0.135 

0.776 

3.243 

249.67 

74.97 

0.1646 

0.2325 


cumSeg 

0.439 

0.233 

0.200 

0.086 

0.043 

1.652 

107.57 

237.10 

0.3394 

0.3222 

M = iJ-T, 

HS(O.l) 

0.140 

0.287 

0.323 

0.176 

0.075 

0.936 

134.92 

135.90 

0.5279 

0.4052 

a = 0.01, 

HS(0.3) 

0.032 

0.146 

0.327 

0.298 

0.197 

0.970 

146.40 

103.03 

0.3106 

0.3393 

6 = 0.5, 

HS(0.5) 

0.009 

0.076 

0.258 

0.329 

0.328 

1.223 

162.18 

92.24 

0.2410 

0.3207 

a ^ an 

CBS 

0.002 

0.004 

0.020 

0.043 

0.932 

5.623 

435.32 

124.66 

0.2462 

0.3440 


cumSeg 

0.420 

0.244 

0.190 

0.093 

0.054 

1.641 

127.88 

248.73 

0.3921 

0.3823 

M = fJ-T, 

HS(O.l) 

0.128 

0.424 

0.446 

0.002 

0.000 

0.693 

23.50 

56.36 

0.4066 

0.2415 

a = 0.025, 

HS(0.3) 

0.017 

0.201 

0.759 

0.023 

0.001 

0.259 

13.43 

22.21 

0.1854 

0.1628 

b = 0.1, 

HS(0.5) 

0.003 

0.086 

0.843 

0.066 

0.002 

0.162 

14.51 

13.77 

0.1218 

0.1416 

G = a B. 

CBS 

0.002 

0.008 

0.395 

0.287 

0.308 

1.135 

64.07 

19.55 

0.1304 

0.1471 


cumSeg 

0.440 

0.240 

0.188 

0.086 

0.046 

1.672 

95.36 

229.25 

0.3058 

0.2796 

M = Mt, 

HS(O.l) 

0.108 

0.344 

0.411 

0.111 

0.027 

0.738 

58.57 

73.26 

0.4223 

0.2606 

a = 0.025, 

HS(0.3) 

0.016 

0.138 

0.468 

0.252 

0.126 

0.715 

77.74 

50.19 

0.2143 

0.1929 

5 = 0.3, 

HS(0.5) 

0.003 

0.058 

0.382 

0.315 

0.243 

0.989 

101.66 

48.03 

0.1503 

0.1749 

G = a B 

CBS 

0.002 

0.003 

0.050 

0.065 

0.880 

5.370 

356.54 

85.59 

0.1585 

0.2027 


cumSeg 

0.438 

0.241 

0.184 

0.091 

0.046 

1.678 

101.67 

234.23 

0.3243 

0.2958 

M = Mt, 

HS(O.l) 

0.054 

0.180 

0.276 

0.226 

0.264 

1.247 

164.83 

114.26 

0.4732 

0.3127 

a = 0.025, 

HS(0.3) 

0.007 

0.060 

0.195 

0.229 

0.509 

1.945 

214.89 

100.21 

0.2748 

0.2586 

5 = 0.5, 

HS(0.5) 

0.001 

0.027 

0.127 

0.191 

0.654 

2.591 

256.78 

101.30 

0.2115 

0.2465 

G = a B 

CBS 

0.000 

0.001 

0.006 

0.011 

0.982 

10.383 

709.91 

149.02 

0.2261 

0.2993 


cumSeg 

0.439 

0.238 

0.184 

0.088 

0.050 

1.698 

113.20 

245.71 

0.3520 

0.3206 

M = Mt,, , 

HS(O.l) 

0.151 

0.435 

0.411 

0.002 

0.000 

0.755 

26.56 

64.22 

0.4469 

0.3000 

a = 0.01, 

HS(0.3) 

0.021 

0.230 

0.725 

0.023 

0.000 

0.297 

15.80 

27.08 

0.2289 

0.2245 

5 = 0.2, 

HS(0.5) 

0.004 

0.103 

0.819 

0.071 

0.003 

0.188 

17.53 

17.34 

0.1622 

0.2019 

G = a B 

CBS 

0.004 

0.012 

0.342 

0.305 

0.338 

1.225 

80.55 

26.89 

0.1689 

0.2056 


cumSeg 

0.422 

0.233 

0.193 

0.095 

0.056 

1.653 

107.36 

234.27 

0.3431 

0.3292 

M = Mt,, , 

HS(O.l) 

0.254 

0.410 

0.298 

0.036 

0.001 

1.012 

68.27 

115.70 

0.6346 

0.4582 

a = 0.01, 

HS(0.3) 

0.055 

0.261 

0.488 

0.176 

0.019 

0.591 

81.26 

80.78 

0.4483 

0.3953 

5 = 0.5, 

HS(0.5) 

0.015 

0.129 

0.466 

0.321 

0.070 

0.624 

100.70 

72.34 

0.3826 

0.3694 

G = a B 

CBS 

0.002 

0.004 

0.022 

0.059 

0.914 

4.231 

357.42 

117.42 

0.2907 

0.3140 


cumSeg 

0.332 

0.211 

0.198 

0.139 

0.121 

1.575 

179.79 

280.65 

0.4522 

0.4317 

M = Mt,, , 

HS(O.l) 

0.136 

0.440 

0.422 

0.002 

0.000 

0.726 

24.82 

59.14 

0.4567 

0.3165 

a = 0.025, 

HS(0.3) 

0.017 

0.219 

0.746 

0.018 

0.000 

0.273 

14.12 

24.13 

0.2432 

0.2435 

b = 0.2, 

HS(0.5) 

0.003 

0.096 

0.843 

0.056 

0.002 

0.162 

14.44 

14.77 

0.1756 

0.2222 

G = a B 

CBS 

0.003 

0.011 

0.353 

0.295 

0.338 

1.231 

71.51 

23.32 

0.1892 

0.2287 


cumSeg 

0.432 

0.238 

0.182 

0.094 

0.054 

1.688 

103.19 

236.34 

0.3604 

0.3501 

M = MT,,, 

HS(O.l) 

0.181 

0.433 

0.370 

0.016 

0.000 

0.831 

37.89 

75.31 

0.7365 

0.5244 

a = 0.025, 

HS(0.3) 

0.033 

0.240 

0.594 

0.125 

0.009 

0.450 

42.28 

44.37 

0.5518 

0.4736 

5 = 0.5, 

HS(0.5) 

0.007 

0.110 

0.541 

0.281 

0.061 

0.534 

64.52 

41.39 

0.4981 

0.4582 

G ^ Gb 

CBS 

0.002 

0.002 

0.023 

0.043 

0.929 

5.589 

365.42 

103.32 

0.4594 

0.4362 


cumSeg 

0.316 

0.184 

0.179 

0.139 

0.182 

1.735 

158.60 

254.38 

0.6153 

0.5317 


Table 6. Simulations with small periodic trends in the mean and n = 1000, K = 10, 
Amin = 30 and C = 200. Columns from left to right: setting, method, proportions oi K — K 
and averages of the corresponding error criteria. HS(a) denotes H-SMUCE at significance 
level a. 
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Setting 

Method 

< -2 

-1 

0 

+1 

> +2 

\k-K\ 

FPSLE 

FNSLE 

MISE 

MIAE 

n = 

1000, 

HS(O.l) 

- 

- 

0.982 

0.018 

0.000 

0.018 

9.05 

2.53 

0.0031 

0.0347 

K = 

0. 

HS(0.3) 

- 

- 

0.927 

0.071 

0.001 

0.074 

37.05 

10.31 

0.0034 

0.0361 

M = 

o' 

III 

a; 

a. 

HS(0.5) 

- 

- 

0.824 

0.167 

0.009 

0.185 

92.40 

26.35 

0.0043 

0.0392 

(J = 

O'R, 

S(O.l) 

- 

- 

0.001 

0.001 

0.999 

11.859 

5929.70 

369.55 

0.8710 

0.1491 



S(0.3) 

- 

- 

0.000 

0.000 

1.000 

14.803 

7401.65 

397.77 

0.9338 

0.1674 



S(0.5) 

- 

- 

0.000 

0.000 

1.000 

16.862 

8431.00 

411.30 

0.9730 

0.1787 



CBS 

- 

- 

0.991 

0.000 

0.009 

0.018 

9.05 

1.13 

0.0058 

0.0340 



cumSeg 

- 

- 

0.955 

0.001 

0.044 

0.188 

93.90 

11.98 

0.0682 

0.0375 

n = 

1000, 

HS(O.l) 

0.008 

0.136 

0.848 

0.007 

0.000 

0.160 

25.70 

62.95 

0.0120 

0.0578 

K = 

2, 

HS(0.3) 

0.003 

0.086 

0.876 

0.035 

0.000 

0.127 

29.74 

44.61 

0.0103 

0.0537 

^min 

= 30, 

HS(0.5) 

0.001 

0.055 

0.851 

0.090 

0.003 

0.152 

44.21 

38.62 

0.0097 

0.0524 


r-R, 

S(O.l) 

0.000 

0.000 

0.001 

0.001 

0.998 

11.104 

2683.40 

250.21 

0.3046 

0.1232 

(J = 

1, 

S(0.3) 

0.000 

0.000 

0.000 

0.000 

1.000 

13.984 

3361.80 

283.17 

0.3264 

0.1340 



S(0.5) 

0.000 

0.000 

0.000 

0.000 

1.000 

15.991 

3836.28 

302.43 

0.3400 

0.1419 



CBS 

0.053 

0.161 

0.726 

0.043 

0.018 

0.346 

46.69 

119.74 

0.0241 

0.0712 



cumSeg 

0.025 

0.097 

0.722 

0.093 

0.063 

0.456 

108.11 

81.86 

0.0557 

0.0707 

n = 

10000, 

HS(O.l) 

0.002 

0.079 

0.916 

0.004 

0.000 

0.086 

93.09 

119.69 

0.0130 

0.0425 

K = 

10, 

HS(0.3) 

0.000 

0.025 

0.957 

0.017 

0.000 

0.043 

86.32 

81.78 

0.0105 

0.0397 

'^min 

= 50, 

HS(0.5) 

0.000 

0.012 

0.950 

0.038 

0.000 

0.050 

99.93 

76.24 

0.0097 

0.0389 


r-R, 

CBS 

0.467 

0.148 

0.167 

0.107 

0.111 

2.516 

1356.25 

6254.20 

0.0877 

0.1308 

<J = 


cumSeg 

0.586 

0.192 

0.136 

0.055 

0.032 

2.242 

997.13 

3005.71 

0.0433 

0.0906 


Table 7. Simulations with distributed errors and C = 200. Columns from left to right: 
setting, method, proportions of K — K and averages of the corresponding error criteria. 
HS(a) and S(a) denote H-SMUCE and SMUCE at significance level a, respectively. 


and have the identity 
(C.2) 


qk = 




■,U 


Then, the events Uk > qk,v and > qk are equivalent and therefore the vector = 
{qi,\j ,..., qdn,v) satishes similar conditions to the equations (|2.3) and (2.5), i.e. 


(C.3) 

and 

(C.4) 


1 - P {Ui < qi^u, ■ ■ ■, Ud„ < qdn,u) — a 
1 - P (f/i < qi^u) 1 - P {Ud„ < qd„,u) 


Pi Pd„ 

The following bounds can be interpreted as a weighted version of the Bonferroni-inequality. 


Lemma C.l. qk^u < 1 — (^Pk for fc = 1,..., 

Proof. We have P {Uj < qj^u) = qj,u for j = 1,. 
Moreover, it follows from condition (C.4) that 1 
with equation (C.3) and J2‘j=iPj — ^ yields 


.. ,dn, since Uj is uniformly distributed. 
- qj^u = (1 - qk,u)Pj/Pk- Combining this 


a —1 - P {Ui < qi^u, ■ ■ ■, Ud„ < qd„,u) 

dn dji 

< ^ P {Uj > qj^u) = X] (1 “ qjp) 
j=l j=l 


Ed 

j=i 


qk,uj 


I Pj 
Pk 


1 ~ qk,u 

Pk 
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which proves the assertion. 


□ 


Lemma C.2 bounds the quantile function of an F-distribution with (1, c) degrees of free¬ 
dom. 

Lemma C.2 (Bounds on the F-quantiles). Let F^^(y) be the quantile function of an 
F-distribution with (l,c) degrees of freedom, then 


(l-y^) = - 1 < < c 


(1 _ j/2) tUT _ ^ 


Proof. We have from (Fujikoshi and Mukaihata, 1993, Theorem 4.2) that 


(x?)"' (y) 


exp 


- 1 


< (y) < c 


exp 


(x?)-' (y) 


- 1 


with (xi) ^ (y) fhs quantile function of the chi-squared distribution with one degree of 
freedom. Moreover, we obtain for all ?/ > 0 

P (x? < 1/) = P {-Vy <z<^) = 2^ (^) -1 


-I (y + ^ 


(xi) ^ (y) = 


where $ ^(y) is the quantile of the standard gaussian distribution. Furthermore, we have 
from (Johnson et al., 1994, (13.48), p. 115) that 


1+ ( l-exp( -- 


X 


0\ N 
^ \ \ 2 


< <F (x) < - 
- ^ ^ - 2 


1 -|- ( 1 — exp 


X 


and so for the quantile function one finds 

log (l - (2y - 1)2) < (y) < ^-2 log (l - (2y - 1)2). 

Combining the formulas proves the assertion. 


□ 


Proof of Lemma 3.1 First of all, (C.2) and the equation \T>k\ = \p2 yields 


Qk = ( gJt' 


,Pk\ ^ J _ ^-1 




1,2'=-! 1 ^k,U 


[n2-'=J ^ 


— -^ 1 , 2 "''-! [^k,U 


-1 


2'“/n 


Moreover, it follows from the Lemmas C.l and C.2 that 

® < Ri-i {&) ((1 - 


< (2^ - 1) 


(1 - 




<2'^ 


-I 
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Applying Bernoulli’s inequality (1 — xY <1 — cx gives 


qk<T 


1 - (1 - a(3kf 


\ 2'=+l-3 ^ 

< 2^ 

f2>^a/3k\ ^'=+^-3 ^ 

/ 


V n y 


Moreover, for x,c > 0 the inequality < 1 + 2xlog(c) holds whenever a;log(c) < 1. 
Together with the assumption k > 2 we hnally obtain 


qk<r 




n 


if 


2fc+l-3 


2 -" log 


2^+1 

< A—r— - log 

- 2*^+1 - 3 


n 




< 8 log 


n 




n \ 1 2^+1 -3^1 

YFY^k) ~ 2 2^+1 - 2 ‘ 


□ 


C.3. Exponential deviation bounds. For the subsequent proofs we need a bound for 
the distribution function of a single test statistic T/ (1.5) which is in our setting a bound 
for the c.d.f. of a non-central F-distribution. 

Lemma C.3. Let Yi ,... ,Yn be i.i.d. gaussian random variables with expectation m G K 
and variance > 0. Let x+ := max(a:, 0). Then, for any 5 7 ^ 0, g > 0 

P{Tf{Y,m + S) < q) 


(C.5) 


< min \ exp ( — - 

z>0 ^ 


1 f Ay/n g(l -h z) 


+ exp ( — (n — 1 ) 


2 ; - log(l -h z) 


2 2 Aa/u 

where A := |5|/s. 

Proof. Let Tf(Y, m) := {j — i + 1) {Yij — mY /s^. Then, 

ff{Y,m + 5) 


T”(y,m + 5) = 


slJs^ 
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The statistics s \^/and T^{Y, m + 5) are independent, since T^{Y, m + 5) depends only- 
on the mean Yin,. Hence, for all 2 ; > 0 


=r[T^{Y,m + 6)<q^ 





) 


sin 

S2 

S2 


liT<i + ^)P( -^<l + z 


+ P Tf(F,m + 5) 


Sin 

s2 

Sin 

S2 

s2 


„2 — 


*ln 


^ > 1 + Z 1 P ( ^ > 1 + Z 


<P(Tf(F,m + 5) <g(l + ^)) + P( ^ >1 + ^ 


< exp 


1 (q{l + z)y\ f r 

if— 


z - log(l + z) 


The hrst term of the last inequality follows from (Frick et ah 

2014 

Lemma 7.3 and 

the proof) and the second from ( 

Spokoiny and Zhilova 

2013 

, Theorem 2.1), since {n — 


l)sL/s^ ~ xLi- 

It remains to show that the minimum in (C.5) is attained for some z>Q. The function 
— g(l + z')j is strictly monotonically decreasing for 2 ; > 0 until the 
function value zero is attained for some hnite The function (n — l){z — log(l + z)) 
is zero for z = 0 and strictly monotonically increasing on [0, cxd). Therefore, the two 
continuous functions intersect and the minimum is attained for some z > 0. □ 

The minimum in the last lemma cannot be determined analytically, but it can be com¬ 


puted numerically. In Lemma |C.4| we estimate the right hand side further to obtain an 
explicit exponential bound. 

Lemma C.4. Let Yi,..., n > 4, be i.i.d. gaussian random variables with expectation 
m G P and variance > 0, then we have for all q > 0 with 


(C.6) 


1 <1 
n 8 


as well as for all 6 0 and A := |5|/s the bound 

(C.7) P iTfiY,m + 5) <q)< 2exp (V^A - • 


Proof. Let z > 0 be arbitrary, but hxed. Then, it follows from Lemma C.3 that 

2 


2 12 


P (T”(Y, m + S) < q) < exp 

<2 exp I — mm 


1 / A^/n q{l + z) 


Ay/n 


+ exp ( — (n — 1) 


z - log(l z) 


1 /AyH _ ,(! + .) y 
2 V 2 A^ ) ’ 


z - log(l z) 
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The inequality 

yields 

P(Tf(F,m + (5) < q) 


1 1 

z - log(l + z) > > 7 min z) 

21 + z 4 ^ ^ 


<2 exp — min 


<2 exp — (n — 1) min 




An 


\ 2 1 

-J ,-(n - 1) min ( 2 :^ 2 ;) 


mm 


a-A^) ,-2 

An 


mm 


■■ 


Now, we minimize the r.h.s in ^ > 0. The functions ^ and z^ are both increasing, the 
function (A — 2g(l + 2 ;)/(An))^ in contrast is decreasing in z. Therefore, both inner 
minima are attained and by solving the corresponding quadratic equations (note that we 
have to take the solution with A — 2g(l + z)/{/S.n) > 0) we get 

P(Tf(F,m + (5) <q) 


<2 exp I — (n — 1) min 




l + 4|i (A-li). 


1 + ^ 
^ An 


2(li)' 


Using the inequality a/1 + Ax < l + 2a: —2a;^ + 4x^ for all a: > —1/4 with x = 2g/(An)(A- 
2g/(An))+ we hnd 

P(T/(y,m + 5) <q) 


<2 exp 


1 n — 1 
8 n 


-nmm 


1 + S 




2q 


An 


2q 


1 - 2^ A-^ 


An 


+J 


Next, we consider the two terms in the minimum separately. We assume w.l.o.g. that 
y/^/[Ay/n) < 1, since otherwise the r.h.s. in (C.7) is two. For the hrst term we distin¬ 
guish the cases 2q > An and 2q < An. If 2q < An is satished, then 


n 


1 + S 


>-nlA-Af =^VSA-^) >i(vTA-V^ 


1 

4 ’ 


An 


For the other case, when 2q > An holds, we obtain with q/n < 1/8 


n 


A-a 

2q 


1 

>-n 


1 _i_ 1 —A \ Al. 

^ ^ An ' ■ ' 


2q 


1 / nA^ 


= -n 


An 


4 \ 2q 


1 n 

■ 4 ^ 
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For the second term it follows with q/n < 1/8 that 


I A 


This yields 


1 . 2 ^ 

An \ An 






l_4i (l-A_) 

n\ A2n/ 


> (^^/nA - \- 


P(T”(y,m + (5) < q) <2 exp 


1 n — 1 
32 n 


y/nA - 


<2exp ( (ynA - 


□ 


C.4. Proofs of Section |3l 


Proof of Theorem 3.3 The estimated number of change-points K is by its definition in 


(3.3) equal to the minimal number of change-points of all feasible functions. Therefore, 


all functions with the true number of change-points (or less change-points) have to be 


infeasible, if the number of change-points is overestimated. Hence, by (2.3) 


sup (k>k)< sup P(^,^ 2 ) ( niax [T/ (F,p([i/n, j/n])) - qij] > 0 


(/i,a-^)ScS 



[7y(F,0))-g,,] >0 =0, 


where the last inequality follows from T>{^) C T> and the fact that the distribution of 
T- (F,/i([z/n, j/n])) does not depend on /i(-) and (t(-), as these are constant on intervals 
in □ 












44 


HETEROGENEOUS CHANGE POINT INFERENCE 


Proof of Theorem 3.4 First of all, we show that it is enough to prove the result for /r = 0 
and cr^ = 1 and hence K = 0. We have 


sup P(/i,o- 2 ) [K > K + 2k 

(At,cr2)e5 


= sup P(;,,<x2) max [T/ (Y, j/n])) - qij] > 0 V /i G s.t. |X(/i)| <K + 2k 

< sup P(;.,a 2 ) ( max [Tf iY,fl{[i/n,j/n\)) - Qij] > 0 

\/ fi E M. s.t. I{fi) C X(/i), |X(/i)| < K + 2k 


<P(o,i) I max [T/ {Y, fi{[i/n, j/n])) - qij] > 0 V p G Xd s.t. |X(/i)| < 2k 

=P(0,1) (k > 2k^ , 


where the last inequality follows from the same argument as in the proof of Theorem 3^ 
Now, we define Rq := 0 and iteratively 

Rk+i '■= min{t > Rk '. 3 s s.t. Rk < s <t and [s/n,t/n] G X>,T*(F, 0) > giog^q-^+p}, 

with the convention min0 = oo. Then, 

Po,i(-Rfc+i < n\Ri = t) < Po,i(-Rfc < n) for alH G {1,..., n}, 

since for the l.h.s. the remaining k rejections -R 2 , • • •, Rk+i have to be in {t + 1, ..., n} 
instead of {1,..., n}. It follows 

n 

P0,1 (-^ > 2 A:) < P< n) = Po,i(-Ra:+i < n\Ri = f)Po,i(i?i = t) 


t=i 


k-\-l ^ _ fc+1 


<Po,i(-Ri < ^)Po,i(-Rfe < < ■ ■ ■ < Po,i(-Ri < < 

where the last inequality is given by Theorem |3.3[ It follows 


sup E(^,a 2 ) {K - K)+ 


a 


= sup P(fi,a 2 ) (k - K > k 

(/^,<T2)ecS ^^0 ^ 


< sup 2 P(At,(T 2 ) (k — K > 2k\ < 2 


(/^,<T2)ecS 


= 


fc =0 


2a 

1 — a 


□ 


The following theorem is sharper version of 3X that shows different probabilities for the 
detection of the change-points. 
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Theorem C.5 (Underestimation control II). Let Xj := Tj^i—Tj and knj ■= [log2(nAj/4)J, 
j = 0,..., K, as well as Sj := \mj — mj-i\ and 


Vj ■ = 


1 — 3 exp 



32a|_, 



1 — 3 exp 




X 


j = 1, 


:K. Under the assumptions of Theorem 3.3 and if nXj > 32 and 


(nXj) Mog 




are satisfied for all j = 1,..., K, then 


K 


P 




K < Kj <l-Y[r]j and E(^,a 2 ) 
i=i 


K-K 


K 


1=1 


Proof. For each j = 1,..., iF we consider the disjoint intervals Ij := [tj — Aj_i/2, Xj + Xj/2) 
and split them into disjoint intervals J)*" U If = Ij such that /i(t) = pU 'i t & and 
/i(t) = pi~ \/ t E /“, with /i+ := max(mj_i,mj) and /i“ := min(mj_i,mj). With¬ 
out loss of generality we assume = mj_i and pT = mj in the following. Then, 
there exists subintervals Jj*" C and J~ C I~ with Ji*", Jf~ E V that have length 

X*_i := since n\I^\ = nXj-i/2 > 3, and 

A* := 77 ,“^ 2 l-'°§ 2 UU/hJ = 77,“^2^"4 > Xj/8, since n|J“| = nXj/2 > 3, respectively. It 
follows 


(iF < iF) = 1 - (iF > iF) 

< 1 — P(/i,o- 2 ) {$ fi E C'(q), j E {1,... ,K} : fi is constant on Ij) 

< 1 - P(^^<^ 2 )(^V j G {1,... ,iF} : $rh< {mj_i + mj)/2 : T^+(y,m) < and 

trh> {mj_i + mj)/2:Tj- (Y, m) < ^ 

K 

< 1 - n {mj-i + mj)/2 : Tj+fY, rh) < and 

1=1 

$rh> {mj_i +mj)/2\ Tj-{Y,m) < 

where we used in the last inequality that the events are independent, since all intervals are 
disjoint. We denote by Zi,..., i.i.d. standard normally distributed random variables. 
It follows from once again from the independence due to disjoint intervals and from the 
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Lemmas 7.1 in (Frick et al., 2014), C.4 and 3.1 that 


rh < {rrij-i + mj)/2 : Tj+{Y,m) < and 

$rh> (mj_i + mj)/2 : Tj-{Y,m) < 

1 - (3m< (mj_i + mj)/2 : Tj+(Y, m) < 

1 - P{k.,a^)(^ m > {mj_i + mj)/2 : Tj-{Y,m) < qk^ 


> 


X 

> 


since 


P(m,^2)(3 m < {mj-i + mj)/2 : Tj+{Y,m) < qk^^.^j 

< P(;,,a2) (yj+ < {rrij-i + mj)/2 or Tj+{Y, (m^.i + mj)/2) < qk^^.^ 

< P(/.,< 72 ) {yj+ < (m^-i + mj)/2) + {Tj+{Y, (m^.i + mj)/2) < qk^^._. 




< exp 


<3exp|-- 



and the second term by symmetry arguments. Moreover, it follows 


E 




<E(/i,t72) 


K 


K-K 


K 


El 

Li=i 


3m<{mj-i+mj)/2-.Tj+{Y,m)<qk^^^_-^ or 3»n>(mj_i+mj)/2:T^_ {Y,m)<qk^ . 


i=i 


□ 


Proof of Theorem 3.5 The proof is analogue to the proof of Theorem C.5, but with L = 


[r,-- A/2, r, +A/2). 


□ 
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Proof of Theorem 3.7 We prove the theorem with the Borel-Cantelli lemma. It follows 


from Theorems 13.31 and 13.51 that 


sup P(/x,a2) [Kn^ K 

(mv^)65a,a 


= sup P(/x,a2) [Kn> K] + sup P [Kn < K 
{Ai,o'2)65A,A ^ ^ 


+ 1 — 


l-3exp|-- 


nAA^ 

32 


- A/16log 


<an + QK exp | - — 


nAA2 

32 


XoinPkn 
2 


2 \ 1 2A 


+ 


16 log 


8 


Xoinfikn,r 


since under the given assumptions the conditions of Theorem |3.5| are satisfied. The upper 
bounds for the error probabilities are summable if (3.6) is satished. □ 


Lemma C.6 (Conhdence set). Assume the setting and assumptions of Theorem 3.5 and 


let C{q) be as in (3.7) with significance level a and weights /3i,... ,/3d„. Let Sa,x be as in 


(3.4) with A, A > 0 arbitrary, but fixed, and kn ■= [log2(nA/4)J. If nX > 32 and 


log 




nX 


< 


512 


hold, then uniformly in Sa,a 

P(m,^ 2 ) e C (g)) > 1 - a - (1 - T]^), 


with 7] like in Theorem 3.5 


Proof. It follows from the definition of C (q) in (3.7) as well as from Theorems 3.3 and 
13.51 that 


, inf P(/.,- 72 ) (/i e C(q)) 

(/i,cr^)G5A,A 


= inf P 

(Al,cr2)e5A,A 




max 


[Tl{Y,^i{\i/n,j/n])) - qij\ <0,iA = 77 


= inf P(;,,^ 2 ) max [T/(y, p([i/n, j/n])) - < 0, it > 77 

V[7A]677(m) 


> inf P(/.,, 72 ) max [T/(y, p([i/n, j/n])) - < 0 I - sup P(^,^ 2 ) (77 < 77 

(M,(T2)e5A,A \[r,i]6D(^) ) (/,,a2)e5A,A 


>1 — a — (1 — 7]^). 


□ 
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Proof of Theorem 3.8 The statement is a direct consequence of Lemma C.6 


□ 


Lemma C.7 (Change-point locations). Assume the setting of Lemma C.6 If Cn is a 
sequence with 0 < Cn < A/2 and kn := [log2(nCn/2)J such that ncn > 16 and 


(C.8) 

hold, then uniformly in Sa,a 


log 


Cna/3fc„ J ^ 1 

ncr, ~ 256 


P(yo- 2 -) I sup max mm It — r| > c„ 

, A6C(g„)'r62;(/^)fei(A) 


< 1 - 


l-3exp|-i 


Proof. Analogously to the proof of Theorem C.5 we have 


-1 2K 


+ , 


sup P(Ato- 2 ) sup max min |f — t| > c„ 

’ \A6C(q„) ^ei(/A) fei(A) 

< sup P(At,o-2) (3 j € {1,..., -A} and fi G ^(q^) : jl is constant on [tj — Cn, Tj -|- c„)) 
(At,o-2)e5A,A 


<1 - 


1 -3exp I - W16log 

48 I V 16 V ^ V 


2K 


□ 


Proof of Theorem \3.t^ For n large enough such that (3.9) guarantees the assumption of 
Lemma FC. 71 it follows 


P(m,<x 2 ) sup max \Tj - if > 1 

\A6C(qJJ = l-,^ 


< P(^,o- 2 ) yK > A or 3 /t G C'(q„), j G {1, ..., A} : p is constant on [tj — Cn, Tj + cf 

< P(^,t, 2 ) (^k > + P(^,a 2 ) (3 p G (^(q^), j G {1, ..., A} : p is constant on [tj - Cn, Tj 

1 


< -|- 1 


l-3expl-- 


nCnlL."^ 

16 


16 log 


Cn(^n/Ik„,r. 



2Ks 

)JJ 

J 


The assertion follows from «„ —)■ 0 and 


lim - A/ 161 og 

n^oo V lO 


C-ri^^nflkn 


= 00 , 


whereby latter one is direct consequence of (3.9). 


□ 
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The following theorem deals with the detection of single vanishing bump against a noisy 
background. 

Theorem C.8 (Single vanishing bump). Assume the heterogeneous gaussian change- 
point model ( |1.2 ) with seguences of hump signals fin{t) := mo + and an{t) : = 

+ Sn^i„(t), where 5n ^ is a seguence of change-point sizes, > 0 a seguence 
of standard deviations on G V, which is a seguence of intervals with |J„| —)■ 0. Let 
kn := [log 2 (n|/„|)J and A„ := |5„|/sn be the seguence of the signal to noise ratios. Let 
{Kn)n, ttn and filin', ■ ■ ■ ,ldd„,n be as in Theorem 


3.10 


We further assume 


(C.9) > (4 + e„)-/^log(j7^, 

with possibly —)■ 0, but such that en\/— log(|/„|) —)■ oo and 


lim sup 


log {anldk„,n, 
e„^-log(|4|) 


^ 4’ 


(C.IO) 

(C.ll) 

(C.12) 

Then, 

(C.13) 


1 • • r ^ I I 

limint -——— 
n^oo log(?7,) 


> 64 and lim 

n —>00 


log (an/3fc„,n) 
n\lJ 


= 0 , 


lim Sn - = 00 and 

Vl/nl 


lim inf > 0, with /3min,n := min{/?i,„,.. .,/3d„,n}- 

n -^00 10g(/?min,n) 


lim > 0 = 1. 


Conditions (C.9) and (C.IO) are the main assumptions of the theorem to detect the 


vanishing signal on We discussed them together with the conditions of Theorem 3.10 


in Section 3.3 We also need the weak technical conditions (C.ll) and (C.12) on the 
size of |/^| and the minimal weight /3min,n to ensure that the detection power on the 


complement is large enough, too. Condition (C.12) is for instance fulhlled by uniform 
weights /3i^n = ■ ■ ■ = (3d„,n = 1/dn, but many other choices are possible, too. We further 
assumed J„ G D, otherwise we have to replace by the largest subinterval which is an 
element of the dyadic partition. Such an interval exists always and has at least length 
j.^-i 2 Liog 2 (n|/n|/ 2 )j ^ |J„|/4. Therefore, omitting the condition In G T) would not change 
the rate. It is possible to strengthen (C.13) further to lim„_,.oo P(/i„,o- 2 ) 
increase all constants a little bit. 


{Kn A LC) = 1 if we 


Proof of Theorem C.8 . We denote by the longest subinterval C Iff which is part 
of the dyadic partition. Such an interval exists (at least for n large enough) always. 
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since |/„| —>■ 0, and has at least length |/^|/8. Moreover, let kn := log 2 (n|/„|) and 


In '■= log 2 (n|Jnl). Then, the Lemmas 7.1 in (Frick et ah, 2014) and C.4 yield for any 
On > 0 


Jnn i^Kn > 0^ 

= lim 1 — 0 - 2 ) (/i is constant) 

> lim 1 - P(;,„,, 72 ) (3 m < mo + 6^^ : Tj^(Y,m) < qk„ or 3 m > mo + 9n ■- Tj„(Y,m) < qi„ 

n^oo 

> lim 1 - P(^„,a 2 ) (3 m < mo + : Ti^iY, rh) < qkj 

n—^■OO 

- P(Mn,a2) (3 m > mo + : Tj„(F, m) < qij 

- ^ < "^0 + On) - P(Mn,<x2) (T/jy, mo + On) < qkj 

- P(Mn.a 2 ) {Yj^ > mo + On) - P(M„.a 2 ) (Tj„(y, mo + On) < qij 

- ^ “ 2P(^„,^2) (T/„(y, mo + On) < qkj - 2P(^„,a2) {Tj„{Y, mo + On) < qi„) 

> Ito l-4exp(-l(r,Jt) -4exp(-l(r, M =1, 


if 


r/„ := 


- On 


- oo Fj^ ;= \/n\Jn\0n - a/^ oo. 


and if the conditions of Lemma C.4 are satisfied. This is the case, since n\In\ oo 


and n\Jn\ —>■ oo, because of (C.IO) and \In\ —)■ 0, as well as qk„/{n\In\) < 1/8 and 
0in/i^\'Jn\) < 1/8 hold at least for n large enough: The first one is a direct consequence 


of Lemma 3.1 and (C.IO) 


9fcn 

n\I„ 


Tog 


< 


\In\oirLf^kn,n 7^1 


n\In 


since then the assumptions of Lemma 3.1 are also fulfilled. The second inequality follows 


from Lemma 3.1, (C.IO) and (C.12) as well as the fact that |/„|/|J„| —)■ 0 


lim 

n^oo n Jr) 


< lim 

n^oo 


8 log 




'log 


Tlltlr), 


< lim - 

8 log 


\Jn \c>l-Ti&ln,r 


n / J-n 


I I 8 log 


Jn|ckn/5/cr} 


\^n\(y-nl^kn-,r, 


Jn. 


n\L 


0 , 


since then the assumptions of Lemma 3.1 are also fulfilled. 


We define now On = \/yi/^ via the equation 
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for 0 < c < 1. Then, it follows from Lemma 3.1 and from y^ar+^ < ^fx + for x,y > 0 
together with the assnmptions of the theorem that 


r,,. 


= Vn\In\Sl/sl - 

>^n\In\Al - V7n|4|/4 - y^l61og 


I ^nPkn-,r 






—)■ oo, 


since the conditions of Lemma 3.1 are satished, as shown above. 
Moreover, we have Lj^ := v^n|Jn|6'„ - a/ 2^ = ^/\Jn\'yn - a/ 2 ^ oo if 


I Jn I dn 


2^7 


> 


I Jn\'^n 


16 log 


\Jn\oi-nyi^j 


—)• OO, 


where we nsed Lemma 3T again. Finally, it follows from the assnmptions of the theorem 
that liminf^^oo \Jn\A liminf„_).oo |.f^|/8 > 0 and thus 


I J n Idn 


\/\In\'ln 


^n\/| Jn 


log 


^n/^kn 


r“®\/‘°s(sjb 


> 




^n 0 kn ,r 


log 


^n^kn,r 




—)■ oo. 


log 




□ 


Proof of Theorem 3.10{ It follows from Theorem 3^ that 

1 “iKn 


Pr 


Kn < Kn) < 1 — 




l-3exp ( -^(rn)+ 


< 6iFnexp ( (Tn''^ 


+ / ’ 


with 


T. _,!n\nAl 


16 log 


^n(^n(^kri,n J 


since the assumptions of Theorem 3.5 are satished by (3.11) 


In case (1) it is enough to show T^ —>■ oo, because Kn is bounded. Finally, F„ —)■ oo 
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follows from 


nXnAl 


log 


oo. 


In case (2) for bounded K^, —)■ cxd follows from 


r„=,/!I^-,/i6iog 


> 


32 


XnOin/^k„ 


+ 


^32 732 

^ ( 


log (^ ) - v^Wiog f ^ - Tlewiog 


^n^kn 


\ 


enjlog ( 3 ;“ 1 “ 7^7 log 


OZ-nf^kn 


—?• OO. 


For unbounded Kn we have Kn < l/A^. It follows 


^nexp ( (r„)^ 


< exp log — - — 


A. 


C 


48 I 732 


A. 


log ( ^ ) + ^^enjlog ( ^ ) - V^Jlog 


732 


A. 


Oin(3kn,r 


< exp 




(^n/^k„,r 


0 . 


□ 


C.5. Proofs of Section I 


Proof of Theorem A.l . We prove the assertion with (van der Vaart, 2007, Theorem 5.9) 
which states three conditions for the convergence of a Z-estimator. Note that the conver¬ 
gence in probability can be replaced by almost sure convergence, if the assumptions hold 
almost surely. We define 


and 


dn 

k=2 

1 - Fi ( 0 i) 1 - 

Fk {9k) 

/5i 

Pk 

dn 

- (1 - a) + ^ 

k=2 

1 — Fm,i ( 6 * 1 ) 1 

— FM,k {9k) 

/5i 

(3k 

q and 9m ■= 

. Now, (A. 2 ) and (A.3) yield 


(Qm) < ^ ( 1 + 


dri 


{/3i,... 


mm 


= 0 ( 1 ) 


almost surely. In addition, Lemma |2.1| shows that the vector of critical values q is 
unique. Moreover, sup 0 g[o^oo)'^« \\Fm{0) - ^(0)11 and sup 0 ^>o \\FM,k{0k) - Fk{9k)\\ for all 














































HETEROGENEOUS CHANGE POINT INFERENCE 


53 


k G {1,..., (i„} converge to zero almost surely. Thus, all assumptions of (van der Vaart 
2007, Theorem 5.9) are satisfied and the assertion follows. □ 

Proof of Lemma A.Sj The computation time for the bounds b^j and bij is 0{1) for every 
hxed interval [z/n, j/n] G V, since they depend only on the sums well as 

and these can be obtained from (precomputed) cumulative sums. The computation time 
for the intersected bounds and Bij are also 0{1) for a fixed interval [z/n, j/n], since 
they can be computed iteratively. Therefore, the total time to compute the bounds is 
0{n), since the dyadic partition contains less than n intervals. 

It follows from its iterative dehnition that the left limits Li,..., can be computed in 
0{n). Therefore, the dynamic programming algorithm has cost — Lk + 1) 

{Rk+i — Lk+i + l)) besides some linear costs, since for each point in the interval [Lk+i, Rk+i] 
the optimal change-point in the interval [Lk, Rk] has to be determined by computing the 
cost functional for each of these points. But, for a single interval the computation time 
for the restricted maximum likelihood estimator and for the cost functional is 0{1) if the 
constraints and Bij are given, since the restricted maximum likelihood estimator and 
the cost functional depend besides these constraints again only on the sums J2i=i 

This proves the assertion. □ 
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